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Preface 


In many practical fields, including engineering, medicine, and finance, among others, right or left skewness, 
bi-modality, or multi-modality are characteristics of data sets that can be modelled using statistical 
distributions. Because of their straightforward shapes and identifiability characteristics, well-known 
distributions, such as normal, Weibull, gamma, and Lindley, are frequently utilised. However, during the 
past ten years, much research has concentrated on the more flexible and complicated Generalized or simply 
G families of continuous distributions to improve their modelling capabilities by including one or more shape 
parameters. 

This book attempts to compile some new results using such distributions that are valuable in theory 
and application. It is motivated by adding one or more parameters to a distribution function makes it more 
versatile and more flexible in analysing data. The book also examines the characteristics of a few novel G 
families and how they might be used for statistical inference. Results are collected that could be added to 
those already available. 

The primary goal of our book is to compile recent advances made by diverse authors in the field of 
G families of their contributions to these new distributions into an edited book. This book will help present 
and future scholars studying the G family of probability distributions to generate additional new univariate 
continuous G families of probability distributions; derive valuable mathematical properties, including 
entropies, order statistics, quantile spread ordering, ordinary and incomplete moments, moments generating 
functions, residual life and reversed residual life functions, among others and apply the Farlie Gumbel 
Morgenstern copula, the modified Farlie Gumbel Morgenstern copula, the Clayton copula, the Renyi entropy 
copula and the Ali-Mikhail-Haq copula for deriving bivariate and multivariate expansions of the new and 
existing G families. 

This book stands out because it includes a lot of new G families, each with its characteristics and 
applications to diverse real datasets and simulation studies utilising various estimation methods. In the field 
of statistical modelling, the book deals with analysing and studying actual data that differ in nature and shape. 
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Chapter 1 


A New Compound G Family 
of Distributions 


Properties, Copulas, Characterizations, Real Data 
Applications with Different Methods of Estimation 


M Masoom Ali,'* Nadeem Shafique Butt,?; GG Hamedani,* Saralees Nadarajah,* 
Haitham M Yousof* and Mohamed Ibrahim® 


1. Introduction 


The statistical literature contains many new G families of continuous distributions which have been generated 
either by merging (compounding) common G families of continuous distributions or by adding one or more 
parameter to the G family. These novel G families have been employed for modeling real-life datasets in 
many applied studies such as insurance, engineering, econometrics, biology, medicine, statistical forecasting, 
and environmental sciences. Refer to Gupta et al. (1998) for the exponentiated-G family, Marshall and 
Olkin (1997) for the Marshall-Olkin-G family, Eugene et al. (2002) for beta generalized-G family, Yousof 
et al. (2015) for the transmuted exponentiated generalized-G family, Nofal et al. (2017) for the generalized 
transmuted-G family, Rezaei et al. (2017) for the Topp Leone generated family, Merovci et al. (2017) for the 
exponentiated transmuted-G family, Brito et al. (2017) for the Topp-Leone odd log-logistic-Gfamily, Yousof 
et al. (2017a) for Burr type XG family, Aryal and Yousof (2017) for exponentiated generalized-G Poisson 
family, Hamedani et al. (2017) for type I general exponential class of distributions, Cordeiro et al. (2018) 
for Burr type XII G family, Korkmaz et al. (2018a) for the exponential Lindley odd log-logistic-G family, 
Korkmaz et al. (2018b) for the Marshall-Olkin generalized-G Poisson family, Yousof et al. (2018) for Burr- 
Hatke family of distributions, Hamedani et al. (2018) for the extended G family, Hamedani et al. (2019) for 
the type II general exponential G family, Nascimento et al. (2019) for the odd Nadarajah-Haghighi family of 
distributions, Yousof et al. (2020) for the Weibull G Poisson family, Karamikabir et al. (2020) for the Weibull 
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Topp-Leone generated family, Merovci et al. (2020) for the Poisson Topp Leone G family, Korkmaz et al. 
(2020) for the Hjorth’s IDB generator of distributions, Alizadeh et al. (2020a) for flexible Weibull generated 
family of distributions, Alizadeh et al. (2020b) for the transmuted odd log-logistic-G family, Altun et al. 
(2021) for the Gudermannian generated family of distributions and El-Morshedy et al. (2021) for the Poisson 
generalized exponential G family among others. For more new G families see Hamedani et al. (2021) and 
Hamedani et al. (2022). 

The cumulative distribution function (CDF) and the probability density function (PDF) of the Topp 
Leone generated G (TLG-G) family (Rezaei et al. (2017)) of distributions are specified by 


Fy pw (x) ={Gy (&)* [2 — Gy (&)*3% (1) 
and 
ha pw (X) = 2aPgy (x) Gy (0)*" [1 — Gy x] [2 - Gy O17; (2) 


respectively, where W refers to the parameter vectors of the base-line model. For # = 1, the TLG-G 
family reduces to the Topp Leone G (TL-G) family. Suppose Z,, Z5, ..., Z, be independent identically 
distributed random variables with a common CDF of the TLG-G family and N be a random 
variable with 


n 


1 
P(N =n;a)= a neENla>0, 
e“-1 n! 


and define My = max {Z,, Z,, ..., Z,}, then, 


F(x)=°P(My <x|N =n)xPr(N =n). (3) 


n=0 


Using equations (2) and (3), we can write, 


Fp (x) = C(a)[1 — exp-a{ Gy (x) [2— Gy &Y1}9], aE R— {0}, a> 0, B>0, (4) 
where (a) = i : a and P = (a,a,f6, W). Equation (4) is called the Poisson Topp Leone generated-G 
—exp(-a 


(PTLG-G) family of distributions. The new CDF in (4) can be used for presenting a new discrete G family 
for modeling the count data (refer to Aboraya et al. (2020), Chesneau et al. (2021), Ibrahim et al. (2021) and 
Yousof et al. (2021) for more details). The corresponding PDFof the PTLG-G family can be expressed as 


Sw (x) Gy (x) [1 — Gy (x) 112 - Gy TY 
exp(a{Gy (x)"[2—- Gy (x)"]}") 


For f = 1, the PTLG-G family reduces to Poisson Topp Leone G (PTL-G) family (Merovci et al. (2020)). 
For a = 1, the PTLG-G family reduces to quasi—Poisson Topp Leone generated-G (QPTLG-G) family. For 
B=a=1, the PTLG-G family reduces to quasi—-Poisson Topp Leone G (QPTL-G) family. 

In this paper, after studying the main statistical properties and presenting some bivariate type extensions, 
we briefly considered and then described different estimation methods, namely, the maximum likelihood 
estimation (MLE) method, Cramér-von-Mises estimation (CVM) method, ordinary least square estimation 
(OLS) method, weighted least square estimation (WLSE) method, Anderson Darling estimation (ADE) 
method, right tail Anderson Darling estimation (RTADE) method, left tail Anderson Darling estimation 
(LTADE) method. These methods are used in the estimation process of the unknown parameters. Monte Carlo 
simulation experiments are performed to compare the performances of the proposed estimation methods for 
both small and large samples. The new PTLG-G family may be useful in modelling: 


f p(x) = 2aaBE(a) a ER-{0},a>0,8 >0. (5) 


I. The real-life datasets with “monotonically increasing hazard rate” as illustrated in Section 6 (Figures 2 
and 3 (top left plots)). 
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Il. 


The real-life datasets do not have extreme values, as shown in Section 6 (Figures 2 and 3 (bottom right 
plots) and (bottom left plots)). 


IH. The real-life datasets for which nonparametric Kernel density estimations are left-skewed bimodal and 


right-skewed bimodal are as given in Section 6 (Figures 2 and 3 (top right plots)). 


The PTLG-G family proved adequately superior to many other well-known G families, as illustrated 


below: 


I. 


Il. 


In modelling the failure times of aircraft windshield items, the PTLG-G family is better than the odd 
log-logistic-G family, the generalized mixture-G family, the transmuted Topp-Leone-G family, the 
Gamma-G family, the Burr-Hatke-G family, the McDonald-G family, the exponentiated-G family, the 
Kumaraswamy-G family, and the proportional reversed hazard rate-G family under the consistent- 
information criteria, Akaike information criteria, Hannan-Quinn information criteria and Bayesian 
information criteria. 


In modelling the service of aircraft windshield items, the PTLG-G family is better than the odd log- 
logistic-G family, the generalized mixture-G family, the transmuted Topp-Leone-G family, the 
Gamma-G family, the Burr-Hatke-G family, the McDonald-G family, the exponentiated-G family, the 
Kumaraswamy-G family, and the proportional reversed hazard rate-G family under the consistent- 
information criteria, Akaike information criteria, Hannan-Quinn information criteria and Bayesian 
information criteria. 


2. Copula 


For modelling of the bivariate real data sets, we shall derive some new bivariate PTLG-G (Bv-PTLG-G) 
type distributions using “Farlie-Gumbel-Morgenstern copula” (FGMC) copula (see Morgenstern (1956), 
Farlie (1960), Gumbel (1960) and Gumbel (1961)), Johnson and Kotz (1975 and 1977)), modified FGMC 
(see Balakrishnan and Lai (2009)), “Clayton copula” (see Nelsen (2007)), “Renyi’s entropy copula (REC) 
(Pougaza and Djafari (2010))” and “Ali-Mikhail-Hag copula (AMHC)” (see Aliet al. (1978)). The multivariate 
PTLG-G (Mv-PTLG-G) type can be easily derived based on the Clayton copula. However, future works may 
be allocated to study these new models (see also Shehata and Yousof (2021a,b) and Shehata et al. (2021)). 


2.1 BvPTLG-G type via Clayton copula 


Let XY, ~ PTLG-— G(P,) and_X, ~ PTLG—G(P,,). Depending on the continuous marginals w = 1 —u and m= 1—m, 
the Clayton copula can be considered as 


C,(u,m) = [ max (7 +m* -1);0] * ,K €[-1,)— {0}, € (0,1) and m € (0,1) 


Let # = 1 — Fp,(x;) |p, m= 1 — Fp,(x2) |p, and 


FP, )haa=—— {lex (-a {Gy 0)" [2- Gy" J") 


i= 
Then, the BvPTLG-G type distribution can be obtained from C,(u, m). A straightforward multivariate 


extension via Clayton copula can be derived. 


2.2 BvyPTLG-G type via REC 


The REC can be derived using the continuous marginal functions w = 1 — uw = Fp.(x) € (0,1) and 


n= 


1 — m= Fp, (x2) € (0,1) as follows, 


FX), X2) = C(Fp, 1), Fp, 2) = xq + xym — x1 Xp. 
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2.3 BvPTLG-G type via FGMC 


Considering the FGMC, the joint CDF can be written as Cy(u, m) = um + umKum, where the continuous 
marginal functions are w € (0,1), m € (0,1) and K € [-1,1]. Setting w = up, |p,+o, and m = mp,|p,~o, we then 
have F(x), x.) = um(1 + Kum). Then, the joint PDF can be expressed as Cy(u, m) = 1 + Ku*m*, where 
u* = 1 —2u and m* = 1 — 2m or fx(X1, X2) = fp, 1) Sp, (%2)c(F p, (%1) Fp,(%2)), where the two functions fx(x), ¥2) 
and c,(u, m) are PDFs corresponding to the joint CDFs F(x), x2) and cx(u, m). 


2.4 BvPTLG-G type via modified F€MC 


The modified formula of the modified FGMC can be expressed as cx(u, m) = KO(u)* Tim)* + um, with 
O(u)* = uO(u) and T(m)* = mT(m) where O(u) € (0,1) and T(m) € (0,1) are two continuous functions where 
O(u = 0) = O(u = 1) = Tim = 0) = T(m = 1) = 0. The following four types can be derived and considered: 


Type I: The new bivariate version via modified FGMC Type I can be written as 
Cy(u, m) = KO(u)* T(m)* + um. 
Type II: Consider A(u; K,) and B(m; K,) which satisfy the above conditions where 
A(u; Ky Jl ,>0) = ¥I = a) 
and 
Bin; Ky)|K5>0) = m2 (1 —m)'*2, 
Then, the corresponding bivariate version (modified FGMC Type II) can be derived as 
Cox iKs (u, m) = um + qumA(u; K,)B(m; K,). 


Type ITI: Let A(u) = ullog(1 + u)\\q@=1-1 and Bim) = m[log(] + m)]\@ =1-m,. Then, the associated CDF of the 
BvPTLG-G-FGM (modified FGMC type II) is 


Cy(u, m) = um + umKA(u) Bc) : 


Type IV: Using the quantile concept, the CDF of the BvPTLG-G-FGM (modified FGMC Type IV) model 
can be obtained as 


C(u, m) = uF '(u) + mF (m) — F'(u) F\(m) 
where F'!(uw) = Q(u) and F"'(m) = Q(m). 


2.5 BvPTLG-G type via AMHC 


Under the “stronger Lipschitz condition”, the joint CDF of the Archimedean Ali-Mikhail-Haq copula can 


be written as C, (v,m) = vm |x-c11, the corresponding joint PDF of the Archimedean Ali-Mikhail- 


1—Kvm 


Haq copula can be expressed as C,(v,m)= : 5| 1-K+2K = IKe(-1,)? and then for any 
[l- Kym] 1-Kvm ; 
v= 1 Fp, 01) = Ipp=a- eo) aNd V = 1 — Fp, (2) = Ifm= a meco,1y) We have 
1 
Cy (15%) = LFp, (a) Fo. (x2)] leateniys 


1-K—Fp, IL Fp (22)] 
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and 
1 


{l —K{l-F, (x, JL = Figs) 
a ek a| Fy (X,) Fp. (x,) i Be - 
1-K{l-F, (x, JO — Fp. @)] : 


3. Mathematical properties 


Cy (%,X_) = 


3.1 Linear representation 
First, expanding the quantity Ap(x) where, 

Ap (x) = exp(-a{G, (x)*[2-G, (x)*]}*), 
which then leads to, 


Ag) =, Gy 0)" 2- Gy 0)" 1" 


Substituting the expansion of A, w(x) in (5), we have, 


+00 Ft paki 3 Vg C(a) 


fr(x)= >) 


20 i! 


Consider the power series, 


a(i+1)-1 
[ + Giy(x)" By (x)Gy (x) — Gy (x). (6) 


SP ee os is ) 
i= 28), oy Ci i 7 
[1-2] -Sen(# 4 m 


<1 and €, > 0 real non-integer. Using (7), the PTLG-G class in (6) can be written as, 


SL 


which holds for 


9 


2 ltl a ul4v-d el a fe 41\= ‘|* OC" | 


fPp(x) = 
r 2a I!/€(a) d =2,,(x)G, (x)Prrarar 


which can be summarized as, 


+00 


109) = » (C) PicaagceW) 7 Cie cineds (xW)} > (8) 


1,d=0 


where, 
1 ‘ 1 


Oa Blatalrdy 2? = Batal+dsh 
aa =a a(i+1)-1 
na i) (a) [ d 
and z,(x) = Ag(x)G(x)*". Equation (8) reveals that the density of the PTLG-G family can then be expressed 


as a linear representation of exp-G PDFs. Also, the CDF of the PTLG-G family can be expressed as a mixture 
of exp-G CDFs. By integrating (8), we get, 


Fp (x) = > {Cha TD p¢a+at+d) (x;W)- ae Hpaserain(3W)}, 


1,d=0 


where II,(x) is the CDF of the exp-G family with power parameter A. 
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3.2 Moments 


The r” ordinary moment of X where X follows PTLG-G family with parameters (a, a, 8, W) is given by 
ul. = E(X’) = J”, x" fo(x)dx. Then we obtain, 


Lyx = >» {C, gE arairay) = CE aad Ne (10) 


1.d=0 


where Y, has a density of the exp-G model with power parameter § The expected value E(X) can be derived 
from (10) when r = 1. The integrations in E(Ypqiq+a) aNd (Via+ar-ar1)) can be performed numerically 
for most parent distributions (see Table 2). The n” central moment of X, variance (V(X)), skewness (S(X)), 
kurtosis (K(X)) and dispersion index (DI(X)) measures can be derived using well-known relationships. The 
s” incomplete moment, say I, (4), of X can be expressed from (9) as I, (4) = i xs J(x)dx. Then, 


t 
+20 Cul X*T gasal+d) (x;W)dx 
L.@O=>)| : (11) 


a Chal XT garateasi (X3W)dx 
The mean deviation about the mean and mean deviation about the median of X are given by 
Aiy = (X ~ m/l) = Qu Fut) — 20, (ut y) and Ay,y = E(X — M) = 11’, — 21,5 (IM), respectively, where 
Hy y= E(X), M= Median (X) = of) is the median, and F(u’, ee) is obtained from (4) and I, ,{¢) is the first 


incomplete moment given by (11) with s = | as 


+00 


1L,@= 3 Vial ctaieas (x;W )dx — C Matin as > 


1,d=0 


where J,(x) = i; ‘ , XH (x)dx is the first incomplete moment of the exp-G distribution. The moment generating 
function M(t) = E(e“) of X can be derived as 


+00 


My, (4) = > $C Mex cuteas (t;W) ~ CM eediaas (,W)} ? 


1,d=0 


where M,(A) is the moment generating function of Y,. 


3.3 Moment of the residual life 
The n* moment of the residual life is given by V,, y (4) = E[(X — 1)” 


X>t,n € N]. Then, the n” moment of the 


residual life of X can be given as V,, y (4) = LG t)" fp (dx. Therefore, using (8) we have, 


1 
1-Fe@ 


1 (0 
V,.x(t) = AOL 


r=0 


cy" > 


ioe) 
) +00 Gal X" Tg aratray XW )dx 
t 
= ial " Fr - 
er Gal x T paratsast (XsW)dx 


The life expectation can then be defined by V, (¢) = E[(X— )|X> tn = 1] which represents the expected 


additional life length for a unit that is alive at age t. The MRL of X can be obtained by setting = | in the 
last equation. 
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3.4 Moment of the reversed residual life 


The n” moment of the reversed residual life is given by U,, y (t) = E [(t—X)"| X¥< t,t > 0, € N]. The n” 


1 
moment of the reversed residual life of X can be given as U,, y(t) = F —— f\(t-x)> Saapow (de. Then, the n” 
moment of the reversed residual life of Y becomes, p(!) 


t 
wo | Cy I, x’ 7 B(a+al+d) (x;W dx 


noe es 


; . 
l.d=0| _~ r : 
Cha I, x I gasatsdsty(% W)dx 


U,, x (t y= = 


The mean inactivity time (MIT) is given by U, y (4) = E[(t—X) | X<t,t>0,n=1] and it refers to the 
waiting time elapsed since the failure of an item on the condition that this failure had occurred in (0,f). 


3.5 Probability weighted moments 


The (s, 7)” PWM of X following the PTLG-G family, says R 
Using equations (4) and (5), we can write 


is formally defined by R, , = E{X° F(X)’}. 


SP? 


r = Kia (1) gasat+d) (x; W) 
Sf p(x)F p(x) 7 > K PVF pecrcetiz paix . 


1,d=0 
where, 
‘0 F  o 

ue Blatal+d) °° °°" Bratal+d+) '” 
+00 1+/ / +l+d qa(i+l)-d 

“ aa” (1+ p) (-l)?""*2 a(i+l)-1 

= pa ; C(a I, - F 
= 


and m,(x) is defined above. Then, the (s,7)’" PWM of X can be expressed as, 


= >» {Kha (NEV pcarai+a)) — Ki, (NEV asai+a+t) )pdx. 


1,d=0 
3.6 Order statistics 


Let X,..., X,, be a random sample from the PTLG-G family of distributions and let X}.,,, X>.,,---. Xy-, be the 
corresponding order statistics. The PDF of the i” order statistic, say X;.,,, can be written as 


in 


Fin) = Fay /OL y(" 7 ete (12) 


where, B(-,-) is the beta function. Substituting (4) and (5) in equation (12) and using a power series expansion, 
we get, 


j+i = Kua (j +i- Ng a+ai+d) (x; W) 
PROP ON =S oe pin 
7 7 “KO Gti- DA paratrary OW) 


1,d=0 
where, 


I . 1 3 
K, ,(j +i- 1) = —————. oi", Ki, ,U+i- = ae 
na HID) favalea nad) Batal+d+l 4 


+00 1+] / ptl+d ya(i+l)-d 
w aaa’ (1+ py (-1)?"*42 a(l+1)-1 
Wid ys 7! ea 4 > 
p= 
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Then, the PDF of X;.,, can be written as, 
f(x) 3 (-1)/ (" 7) y K, a ti- 2 pasatra) (x;W) 
in x)= She Cl : * . , 
j=0 Bii,n -i+l) J 1,d=0 ig G +1- NF garatrasy3W) 


Then, the density function of the PTLG-G order statistics is a mixture of exp-G PDFs. Based on the last 
result, we note that the main properties of X;.,, follow from those properties of Yg44;+4 and Ygaspa+1 Which 
are the exp-G PDFs with bower parameters f(at+/+d) and f(a+/+d)+1 respectively. For example, the s” 
moment of X;.,, can be expressed as, 


E(X° 3 Gly we % Kg U+i-DE(Yrosatsay) 
in Bi,n-i+l) J id=0 -K* , (j+i-DE(¥ 


J=0 
B(atal+d) 


(13) 


4. Characterizations 


In this section, we present certain characterizations of the PTLG-G distribution in the following cases: 
(i) based on two truncated moments, (i1) in terms of the reverse hazard function. We present our characterizations 
(i) and (ii) in two subsections. 


4.1 Characterizations based on two truncated moments 


This subsection deals with the characterizations of PTLG-G distribution in terms of a simple relationship 
between two truncated moments. For the first characterization, we use a theorem of Glanzel (1987) see 
Theorem 4.1.1 below as stated in Glanzel (1987). 


Theorem 4.1.1. Let (Q.FP) be a given probability space and let H = [d,e] be an interval for some 
d<e(d=-—, e = © might as well be allowed). Let X:Q — H be a continuous random variable with the 
cumulative distribution function Fp and let Q; and Q, be two real functions defined on H such that, 


E[Q, (X) | X2 x] = E[Q, (X) |X 2x] ¢ (x), x € H, 


is defined with some real function €& Assume that O'(X), Q, (X) € C'(A), C € C* (A) and F»p is a twice 
continuously differentiable and strictly monotone function on the set H. Finally, assume that the equation 
(X)[Q; (X)] = Q> (X) has no real solution in the interior of H. Then F’p is uniquely determined by the 
functions Q,(X), Q>(X) and €, particularly, 


; E(u) 
= d 
£2) I.<lgeaaen-arak” aie 
where the function s(.) is a solution of the differential equation s‘.) = SAO) and C is the 
normalization constant, such that [dF pQ)=1L SMAOI]- 0) 


Remark 4.1.1. The goal in the above theorem is to have €(x) as simple as possible. 

Proposition 4.1.1. Let XY - Q— R be a continuous random variable and let, 

exp(a{Gy (x)! [2 = Gy (2) 13") 
[l=G("12=G.Gy 


and Q,(x) = Q; (x) Gy{x)” for x € R. The random variable X has PDF (5) if and only ifthe function ¢ defined 
in Theorem 4.1.1 has the form, 


OQ, (x) = 


’ 
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Proof. Let X be a random variable with PDF (5), then 


(1 —Fp (x) )E[Q, (X) | X= x] = 2a[1 — G yx)", x ER, 
and 
(1 — Fp (x) )E[Q, (X) | X= x] = all -— Gy], x ER, 
and finally 
Box) Q, (8) ~ O2 (2) = 5 Oy (ML ~ Gy "I> 0, ER. 


Conversely, if € is given as above, then, 

EW) —_ BEw)Gy 
§(x)Q, (x) — OQ, (x) 1=Giy (4) 
and hence s(x) = — log [1 — Gy{x)”], x € R. Now, according to Theorem 4.1.1, X has PDF (5). 


eR, 


s'(x)= 


Corollary 4.1.1. Let_X : 2 — R be a continuous random variable and let Q,(x) be as in Proposition 4.1.1. 
Then X has PDF (5) if and only if there exist functions Q,(x) and ¢ defined in Theorem 4.1.1 satisfying the 
first-order differential equation, 
E(x)O(x) BS w(X)Gy (0) " 
5(x)Q,(x) — Q, (x) 1-Gy (x) 


Corollary 4.1.2. The general solution of the above differential equation is, 


,xeER. 


where is a constant. A set of functions satisfying the above differential equation is given in Proposition 4.1.1 
with It should, however, be mentioned that there are other triplets satisfying the conditions of Theorem 4.1.1. 


4.2 Characterization based on reverse hazard function 

The reverse hazard function, 'Fps of a twice differentiable distribution function, F; P> is defined as, 
f(x) 
F(x) 


This subsection is devoted to a characterization of the PTLG-G distribution in terms of the reverse 
hazard function. 


rF, (x) = , x € support of Fp. 


Proposition 4.2.1. Let X : 2 — R be a continuous random variable. Then _X has PDF (5) if and only if its 
reverse hazard function rp, (x) satisfies the following first-order differential equation, 


(aP —l) gw (x) 


Trp (X) — Gy (x) Vip (x) 
= B _ Bya-l 
= 2anPGy (x)? 4.) Sx OU Ge OV I2- Gy OT | ep 
~ dx | exp{atGy (x)’[2- Gy (x)"]}\-1 


with boundary condition lim, ,.. "Fp (x) = 0. 


Proof. If X has PDF (5), then clearly the above differential equation holds. Now, if the differential equation 
holds, then, 


ft, (x)Gy oe, = 2aap 


d | &w (x)[1- Gy (x)* [2 = Gray F~ 
dx . 


dx} exp(a{Gy (x)"[2-Gy (x)"]}*)=1 
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or 
Ew (x)Gy ri ~ Gy (x)? [2 - Gy (63 al ua 

exp(a{Gy (x)"[2- Gy (x)?]}*)-1 
which is the hazard function of the PTLG-G distribution. 


,xeER, 


ip (x) =2aap 


5. Studying a special model 


In this section, a new special PTLG-G model based on the Lomax distribution called the PTLG-G Lomax 
(PTLGL) distribution is considered. The following contributions will be considered: Some plots for the 
PDFs of the PTLGL distribution for some selected parameter values and some plots for the HRFs of the 
PTLGL distribution for some selected parameter values are sketched (see Figure 1). Two theorems related 
to the ordinary and incomplete moments of the exponentiated Lomax (exp-L) distribution are presented. 
Theorem 5.1 and Theorem 5.2 are employed for deriving relevant mathematical properties of the PTLGL 
distribution (see Table 1). Numerical results for the variance, mean, kurtosis, skewness, and the PTLGL 
distribution are listed in Table 2. Figure | gives some PDF and HRF plots of the PTLGL model with c = 1. 
Based on Figure | (right plot), the PDF of the PTLGL can be “ asymmetric right-skewed” and “symmetric” 
with many useful shapes.Based on Figure | (left plot), the HRF of the PTLGL can be “constant”,”’decreasing”, 
“upside-down”, “increasing” and “ increasing-constant”. 

Below, we present two theorems related to the exp-L distribution. The two theorems are employed in 
deriving the mathematical properties in Table 1. 


Theorem 5.1. Let Y,,, be a random variable having the exponentiated Lomax (exp-L) distribution with 
power parameter A+ 1. Then, the CDF of the exp-L model can be expressed as, 


G5 C= ea | 
Cc 


Then, the r” ordinary moment of Y,,, is given by, 


E(X’) = yi +e’ (-1)" (" Ja[ ei, 


where, B(c), ¢>) = hi us! (1 —u)2'du is the complete beta function. 


Atl 


m 


et\ia>r 
A 


2 | a=50 a=100 B=1 A=1 
= a=5 a=1.5 B=5 2=25 2 | 
a=-10 a=1 B=1 2=0.99 3 
es a=150 a=100 B=0.15 4=1 = 
= 7 a=-5 a=1 B=1 A=1 x 
ay we 24 
a 2 4 to 
J 
24 
8 4 
8 4 
Oo 
24 8 4 
° T T T T T T T 2 T T T T T 
0 1 2 3 4 5 6 0 2 4 6 8 


Figure 1: PDFs and HRFs for the PTLGL model. 


A New Compound G Family of Distributions 


11 


Table 1: Theoretical results of the PTLGL model. 


Property 


Result 


Support 


EX’) 


SS-or(! 


1,d=0 m=0 


C, [Bla t+al+ ayia [oa +al+d)],~—-+ i 


-C, [Bla al+d+1)B{(plaral+a+ty,7—"+1] 


A>r 


A 


MO) 


oeeor(!) 


1,d=0 m=0 


Cy gl Bla + al +ayB( (6a +a 4) a +1) 


x 


Clas al+d +B [Pleat +a y= 1) 


A>r 


I, (¢ ) 


Y decw() 


1,d=0 m=0 


Calla a+ a, [la + ald) ms +1) 


x 


-C) [Blat+al+d+)]B, ((ata + at+a+9y,7S +1) 


A>s 


I 


+o 1 C, gl B(a + al + dB, ( Blatal+d)], 7 
oa) 
1,d=0 m=0 aC [B(at+al+d+l)]B, [oa t+al+d+l)], 


= a> 
+1) 


m1) 
A 


Vix (0) 


where, 


and 


= 


1,d=0 m=0 


a [pte +al+d)], ae +1) 


“41 


ChayvV mba + al + d)] is 


-B, (Zc +al+d)], Fi 


x 


aoe v (V,n)[B(a+al+d+1)] 
d, a 


—B, (a +al+d+l)], 7 


Chay Vit) = Cha a (7) CA", 
Chav V9) = Cha > pag (7) OO” 


(tia ated +9. +1) 


t>0, 
n€EN, 
A>n 


‘ 


Table | contd. ... 
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... Table 1 contd. 


Property Result ant 
Vix® ne 
1 m = 1 
Ko ae (i) ae 
a (Has ar+ay) +1) 
ChavVDIB(a + al +d)] 
8, [aa +al+d)], = i 
{(Aarat rd +p) ; m7) 
aC, ayV DIB(@+al+d+)] 
3, ((ote ral+d +l), — at) 
where, . 
Chav V,D= Ca), (1) Co, 
and ! 
Cha us ) ~ Cha baa (1) oe 
Uy (t) — - 
: ne€N, 
pda A>n 
1,d=0 m=0 
ChayU.n[Ba + al +d) 1B, (Ze al+d)], mon +1) 
G ay Um Blaral+d+)lB, [ta tal+d+ p= + 7 
where, 
Cray Un) = Chay ino C1y(;) yeh, 
and 
Chay Uom) = Cia Das (7) 
oieaC) a ~ 
h i 
soy el) ) nah 
1,d=0 m=0 
Cha yU.D[B(a + al + d) 1B, (Zc +al+d)], m n 7 
Cay OC DAa+ al +d +118, (4a ald +) ty 7 
where, . . 
Chay UU, 1) = Gay, - cy(,) teh, 
and | 
Chay (U, l)= Cha er Cy ;] tha, 
e +00 1 <a 


K gC Bla a+ AB [pla at ay, ™—S +1) 


RY 
# 


-K, Ba ald +1) [plas al+ d+) 


‘| 


Table | contd. ... 
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... Table 1 contd. 


Property Result Support 
“8 A>s 
= en (Seen: 
e 
B(i,n-i+1) = m 
m= 


Kp a +i- [Bat al + oa 


‘ 


~~ 


j+i- [Bla + a+ 18 


Theorem 5.2. Let Y,,, be a random variable having the exp-L distribution with power parameter A + 1. 
Then, the r” incomplete moment of Y,,,, is given by, 


1,,)= Vat de’ ae {asim 


m=0 


“ai}larn 


where, Bc), ¢>) = Just! (1 — u)"du is the incomplete beta function. 

Table 2 below gives a numerical analysis for the mean (E(X)), variance (V(X)), skewness (S(X)), 
kurtosis (K(X)) and the dispersion (DI(X)) for PTLGL distribution. Based on the results listed in Table 2, it 
is noted that E(X) decreases as a increases; E(X) increases as a increases; E(X) decreases as f increases; 
S(X) € (2.316822, 8105.145); K(X) ranging from 2.247183 to and DI(X) is always more than one, which 


means that the PTLGL model will be suitable for the “over-dispersed” data sets. 


Table 2: E(X), V(X), (S(X)) and kurtosis K for PTLGL model. 


re a | a B E(X) Vv S K DI 
a5 200 | 1 3 13.82128 46.424300 5.545125 356.415 3.358899 
2,2 —100 11.25278 32.974520 5.520506 362.5764 2.930343 
9.081657 23.473030 5.48318 367.0563 2.584664 
1.67 x 105 0.4034740 5.011508 466.8554 24147.98 
3.34 x 105 1.0325210 2.903812 143.5774 30898.23 
5.01 x 105 1.6932800 2.316822 80.60721 33780.93 
8.35 x 10° 0.1275966 13.06459 2338.006 15277.52 
1.49 x 105 0.2174070 10.31513 1430.786 14643.04 
2.32 x 105 0.3264422 8.658873 990.9689 14072.31 
8.538794 79956.260 177.0491 41749.03 9363.883 
5.518941 48957.670 226.2184 68170.23 8870.845 
0.50.25 | 10 0.1845575 38.186930 8105.145 87450930 206.9108 
0.50.25 | -100 | 1 0.5 57.78778 622701.80 63.46806 5361.263 10775.66 
0.5,0.25 | 10 427.9407 60500080 20.47917 555.0093 14137.49 
0.5,0.25 | 10000 26955.06 924499115 0.736274 2.247183 34297.79 
0.5,0.25 | 100 | 10000 1 6382.283 464195758 3.228576 11.87127 72731.94 
0.5,0.25 | 15 362.8912 315368350 15.63219 248.3138 86904.37 
0.5,0.25 | 5 6.777539 619562.20 116.9011 13746.34 91414.04 
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6. Estimation methods 


This section briefly describes and considers different classical estimation methods: the MLE method, CVM 
method, OLS method, WLSE method, ADE method, RTADE method, and LTADE method. All these methods 
are discussed in the statistical literature with more details. In this work, we may ignore some of its derivation 
details for avoiding repetition. 


6.1 The MLE method 


Let x,,..., x,, be a random sample from the PTLG-G distribution. For determining the maximum likelihood 
estimates (MLEs), first we derive the log-likelihood function, 


((P) = mlog2—m(l—e“)+mloga+mlogat+mlog B 


a log gy (Xm) + (ab - vy log Gy am) Dy logl2-Gy (ji3)"] 


+) log[l=Gy (%n) 1+ (@-1)-a {Gx Y [2-Gy im)” I} a. 


i=l 


The components of the score vector are, 


é é a é 
U, =—4(P),U, =— “(P), U, = —4(P) and Uy =— £(P). 
ge Eso Ea 2 (P) and Uy 3, (P) 


Setting the nonlinear system of equations U, = U, = Ug =0 and Uy = 0 and solving them simultaneously 
yield the MLEs. To solve these equations, it is usually more convenient to use nonlinear optimization methods 
such as the quasi-Newton algorithm to numerically maximize ((P). 


6.2 The CVME method 


The CVME ofa, a, f and W are obtained via minimizing the following expression with respect to a, a, § and 
W respectively, where, 


CVME(P) im an +>) LF se (X45, m) -_ Becale ? 
i=l 
and ¢(; ») = [(21— 1)/2m] and 


CVME(P) = > [ota E — exp (-a 1G Cial [2- CoG ) ~ im) 1 


The CVME of a, a, f and Ware obtained by solving the following non-linear equations: 


0= C(a)| 1- exp{ -a eed | 2 = Gpltin) } } — CG m) V (a) (Xim] ;P), 
i=l L L 4 el 

0= C(a)| l-exp| -a\G, (x, y’|2-G (x, |] 7 —C.) Veo) (Xi mE) 

Pp WMT] L Wt] | (i,m) (a) [i,m] ? —7? 
i=l L L 4 J 

0= » C(a)| 1- exp(-a eee | 2 = Gy Xm)” } } — CG) Vin (XimpP)> 
i=l L L a J 


A New Compound G Family of Distributions 15 


and 
0= [ct [ es exp( -a 1 Cw is y [2 = Gi iwi)” } ) — CG my Pun (Xtim3 P), 
i=l 
where, 
OF» (Xi) OF) (Xtimy) 
Via) (Xt >a, b, 0) om ae ae Vics (Xin >a, b, 0) = ee 
OF p im) OF » (im) 


V (py i,m 45, 8) = and V (9) (Xi mys 49,9) = 


Ow 


6.3 The OLSE method 
Let F'pQ%j, m1) denotes the CDF of the PTLG-G family and let x, <x, < --- <x,, be the m ordered RS. The 


1,1 
OLSEs are obtained upon minimizing 


OLSE(P) = DF p,m) — Bim 
i=l 


Then, we have, 
m a 2 
OLSE(P) = > fc [ - exp( -a {GG [2 - Gy im)” } ) = Dim ; 
2 i=l 


i 
where, D¢,m) = rar, The LSEs are obtained via solving the following non-linear equations 
; m 


0= » C(a)| 1- exp{ -« {Cp Cieay |2 = Gy tim)” } } — Dim) Va) (XimpP), 


i=l L el 


0=>°] ca) 1-exp{ -a{Gy Om)" [2-Gy in)” |} } — Bim |V (ay um 32)» 


i=l L 4 zl 


0= > C(a)| 1- exp( -a ee \ i | 2 = Gira } } ~ Dim) Veg) (Xti,m)3 P), 


isl L J | 


and 
m 


0= cua [ - exp( -a (Cv Cie Na [2 Og isa) } ) — Dis my Vy tim sP)- 


i=l 


6.4 The WLSE method 
The WLSE are obtained by minimizing the function WLSE (a, a, £, W) with respect to a, a, B and W 


where, 


WLSE(P) = >, iim) [Fp (Xtimy) _ Dim) lig 5) 


i=l 


and 


Gm) = [d ED my) (2 + m)\/fid +m— i)]. 


16 G Families of Probability Distributions: Theory and Practices 


The WLSEs are obtained by solving 
0= oe cia [ 2 exp( -a Cana)" [2 =Gy Ging)” } ) Dim eV ox Crag ED. 
il 
0= > Gm) ic [ a exp{ -a {Gu (im)? [2 ~ Gy Aim)” } ) ~ Dim) Va) (XtimpsP)> 


i=l 


ae > Bm | C(a)] 1 exp( -a {Guy iim)? [2 - Gy (Mim)? } } ~ Pim | V (6) im sP)- 
i=l L L 


and 


0= y) Dm) C(a)| 1- exp( -a {Gu Cm)” [2 - Gy (im) } } = Bins Vie. (x tim)? ;P). 
i=l 


6.5 The ADE method 
The ADE of a, a, 6 and W are obtained by minimizing the function, 
— log F, (x i,m ) 
ADE m]>*[-it+]+m:m]) (P) mean m' pe (2 1- ) mi . 
: : i=l of log [I = F (Xtsm im] ) 
The parameter estimates of a, a, 8 and W follow by solving the following nonlinear equations: 


vs é [ ADE sm X[14+m-i:m) (P) | /€a, v= é [ ADE wn XE 1+m-i:m) (P) | / 0a, 


0=0| ADE 


(i,m) > [1+m-i:m] 


(P) |/oBando = ol ADE 


Q{i,m]  [1+m-i:m] 


(P) |/ow. 


6.6 The RTADE method 
The RTADE of a, a, 6 and Ware obtained by minimizing 


1 m l m ; 
RTADE (x. 2X [-i+]+m:m])) (P) = 2 ns 2 Fp (im) = a (2 = ) {log [1 ~ Fp (Xs mim )}} : 
i=l 


i=l 
The estimates of a, b and @ are obtained by solving the following nonlinear equations: 
v= O[RTAD Gx, [ism] [1+m-i:m] p)V/ea, 0=0 [RTADE@, [ism]>*[1+m-i:m] ) (P)|/da, 
0= O[RTAD Ga, Tim] [1-+- ic] pR)V/op, 0=0 [RTADE Ge 21 mit) (P)/oWw. 


6.7 The LTADE method 
The LTADE of a, a, 6 and Ware obtained = minimizing, 


LTADE,,,,,) (2) =- =m + ae (im) - <S)(2i- 1) log Fp (Xim))- 
i=] 


The parameter estimates of 6, 0 and / are obtained by solving the following nonlinear equations: 
0 = OLTAD G,;,.)(P))/0a, 0 = 0 [LTADE a, ,,;) (P)]/00, 
0 = OLLTAD Gj; (PB) /0B and 0 = 6 [LTADE¢,;, ,,,.) (VOW. 
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7. Comparing methods 
7.1 Simulations for competitive estimation methods 


A numerical simulation is performed to compare the classical estimation methods. The simulation study is based 
on N= 1000 generated data sets from the PTLGL model defined in Section 5 where m = 50, 100, 150 and 300 and 


Blend J Initial > ao an) Bo Ao Co 
I -2 1.5 1.5 0.6 0.1 

II 12 1.2 1.2 1.2 0.3 

Il -1.2 2 2.5 2 0.5 


The estimates are compared in terms of their biases, the root mean-standard error (RMSE). The mean 
of the absolute difference between the theoretical and the estimates (D-abs) and the maximum absolute 
difference between the true parameters and estimates (D-max) are also reported. 

Tables 3, 4 and 5 give the simulation results. From Tables 3, 4, and 5 we note that the RMSE tends to 
zero when m increases, which implies the incidence of consistency property. 


Table 3: Simulation results for blend I. 


Methods | m = rs : 
a a Bp A c abs max 
MLE 0.781899 0.211464 0.074903 0.005958 0.001024 0.029454 0.043999 
OLS 0.780846 0.2189683 0.0758624 0.006782 0.000931 0.014568 0.022046 
WLS 0.791896 0.2115054 0.077497 0.0060914 0.0010457 0.0094253 0.014446 
CVM 20 0.793498 0.222415 0.076071 0.006433 0.000982 0.017160 0.026129 
ADE 0.752596 0.200083 0.072198 0.005903 0.000960 0.011598 0.017838 
RTADE 0.967502 0.296462 0.090337 0.005451 0.001212 0.034909 0.051486 
LTADE 0.760520 0.187759 0.071923 0.008726 0.000951 0.004153 0.007710 
MLE 0.280263 0.066134 0.026733 0.002284 0.000350 0.006209 0.009787 
OLS 0.312155 0.083412 0.030663 0.002686 0.000363 0.009815 0.014591 
WLS 0.311915 0.074718 0.029695 0.002439 0.000389 0.006040 0.009152 
CVM 50 0.314216 0.083993 0.030698 0.002522 0.000380 0.010983 0.016405 
ADE 0.298349 0.075533 0.028905 0.002365 0.000368 0.009224 0.013740 
RTADE 0.381704 0.107703 0.036506 0.002297 0.000457 0.018436 0.027157 
LTADE 0.289757 0.070427 0.028190 0.002968 0.000363 0.003952 0.006431 
MLE 0.141951 0.032639 0.013284 0.001076 0.000170 0.007603 0.011291 
OLS 0.149553 0.039150 0.014721 0.001285 0.000172 0.008413 0.012366 
WLS 0.151366 0.034563 0.014151 0.001157 0.000186 0.007877 0.011635 
CVM 100 0.150093 0.039309 0.014733 0.001204 0.000179 0.009013 0.013288 
ADE 0.142256 0.035504 0.013810 0.001117 0.000173 0.008373 0.012307 
RTADE 0.175811 0.047481 0.016824 0.001082 0.000203 0.012165 0.017849 
LTADE 0.137238 0.033120 0.013493 0.001384 0.000172 0.006271 0.009304 
MLE 0.047571 0.010863 0.004528 0.000349 0.000057 0.001991 0.002952 
OLS 0.052247 0.013298 0.005113 0.000448 0.000058 0.000357 0.000618 
WLS 0.054718 0.012101 0.005025 0.000386 0.000065 0.003253 0.004853 
CVM 300 0.052296 0.013311 0.005113 0.000420 0.000061 0.000558 0.000935 
ADE 0.050503 0.012346 0.004868 0.000390 0.000059 0.000526 0.000845 
RTADE 0.058356 0.015174 0.005544 0.000362 0.000065 0.002464 0.003640 
LTADE 0.050990 0.012053 0.004992 0.000509 0.00062 0.000920 0.001376 
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Table 4: Simulation results for blend II. 


RMSE D 
Methods | m n x ~ 
a 6 B A c abs max 

MLE 0.703433 0.068196 0.035148 0.072391 0.007238 0.024048 0.036846 
OLS 0.716239 0.081305 0.038636 0.084582 0.007433 0.014173 0.021749 
WLS 0.695036 0.068648 0.034998 0.075706 0.007510 0.012948 0.019316 
CVM 20 0.726111 0.083086 0.038959 0.085188 0.007353 0.015584 0.024444 
ADE 0.687264 0.069222 0.034893 0.075277 0.007139 0.012287 0.018917 
RTADE 0.705086 0.119748 0.050572 0.070665 0.008090 0.040076 0.058915 
LTADE 0.868825 0.062239 0.032851 0.110364 0.007663 0.010083 0.016389 
MLE 0.265814 0.022575 0.012358 0.026278 0.002578 0.003940 0.007280 
OLS 0.259876 0.027913 0.014016 0.028914 0.002666 0.013770 0.020128 
WLS 0.267629 0.023352 0.012672 0.025851 0.002749 0.015659 0.022755 
CVM 50 0.261113 0.028243 0.014089 0.028973 0.002654 0.014404 0.021179 
ADE 0.251497 0.024705 0.012865 0.026217 0.002570 0.013592 0.019834 
RTADE 0.258310 0.036706 0.017213 0.025333 0.002760 0.023339 0.033973 
LTADE 0.290383 0.022483 0.012165 0.033664 0.002808 0.006031 0.009264 
MLE 0.135498 0.011094 0.006240 0.012999 0.001362 0.008541 0.012652 
OLS 0.137583 0.014554 0.007398 0.015174 0.001384 0.008315 0.012131 
WLS 0.143940 0.011688 0.006489 0.013513 0.001398 0.006783 0.009986 
CVM 100 0.137908 0.014640 0.007417 0.015191 0.001381 0.008616 0.012629 
ADE 0.133280 0.012828 0.006766 0.013889 0.001329 0.007574 0.011049 
RTADE 0.134403 0.018140 0.008721 0.013271 0.001385 0.012224 0.017800 
LTADE 0.155833 0.011748 0.006458 0.017970 0.001486 0.003948 0.005962 
MLE 0.042737 0.003521 0.001979 0.004077 0.000419 0.003977 0.005838 
OLS 0.042560 0.004327 0.002236 0.004686 0.000422 0.002902 0.004226 
WLS 0.049362 0.003822 0.002163 0.004573 0.000477 0.003545 0.005295 
CVM 300 0.042597 0.004338 0.002238 0.004688 0.000421 0.003008 0.004402 
ADE 0.042011 0.003905 0.002085 0.004385 0.000413 0.003127 0.004552 
RTADE 0.043507 0.005513 0.002719 0.004330 0.000438 0.003099 0.004543 
LTADE 0.046642 0.003513 0.001945 0.005292 0.000449 0.003427 0.004983 


7.2 Applications for comparing the competitive estimation methods 


Two applications to the real data set are considered for comparing the estimation methods. The Ist data 
set called the “aircraft windshield” represents the data on failure times of 84 aircraft windshield. The 2nd 
Data set also called the “aircraft windshield” represents the data on service times of 63 aircraft windshields. 
The two real data were reported by Murthy et al. (2004). The required computations are carried out using 
the MATHCAD software. In order to compare the estimation methods, we consider the Cramér-von Mises 
(CVM) and the Anderson-Darling (AD) statistics. These two statistics are widely used to determine how 
closely a specific CDF fits the empirical distribution of a given data set. Table 6 and Table 7 give the estimates 
and the test statistics for all the estimation methods. From Table 6 we conclude that the MLE method is the 
best method with CVM = 0.04630 and AD = 0.47186. From Table 7 we conclude that the MLE method is the 
best method with CVM = 0.05717 and AD = 0.35320. However, all other methods performed well. 
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Table 5: Simulation results for blend III. 


Methods m _ - x 
a 6. B 2 c abs max 
MLE 0.765 1606 0.4001974 0.2165702 0.0411279 0.0059717 0.0245247 0.0359504 
OLS 0.6591711 0.3669806 0.1997503 0.0414978 0.0053998 0.0091343 0.0136777 
WLS 0.693372 0.375594 0.206029 0.038757 0.005734 0.032803 0.047369 
CVM 20 0.734169 0.454807 0.227511 0.040798 0.005851 0.045657 0.066121 
ADE 0.696793 0.384389 0.208877 0.038572 0.005690 0.039343 0.056894 
RTADE 0.828938 0.564812 0.263819 0.037734 0.006190 0.062648 0.090633 
LTADE 0.711550 0.349402 0.202762 0.048661 0.006159 0.024155 0.035170 
MLE 0.2524062 0.1089245 0.0686767 0.0143282 0.0020404 0.0092676 0.0135787 
OLS 0.2799867 0.1502873 0.0853749 0.0169848 0.0023274 0.0161194 0.0233995 
WLS 0.293754 0.126963 0.079370 0.016905 0.002382 0.015562 0.022531 
CVM 50 0.289337 0.153758 0.087473 0.017725 0.002392 0.019822 0.028732 
ADE 0.280983 0.135956 0.081320 0.016778 0.002317 0.017172 0.024871 
RTADE 0.331773 0.193546 0.103037 0.016625 0.002544 0.026703 0.038675 
LTADE 0.284320 0.124703 0.078610 0.020253 0.002475 0.011298 0.016442 
MLE 0.0898733 0.037102 0.0243643 0.0052244 0.0007408 0.0030559 0.0044817 
OLS 0.0846877 0.042357 0.0253857 0.0053323 0.0007115 0.0049563 0.0071751 
WLS 0.130984 0.052145 0.033951 0.007536 0.001039 0.005642 0.008177 
CVM 100 0.127780 0.063662 0.037961 0.008066 0.001060 0.004655 0.006776 
ADE 0.123282 0.056301 0.034942 0.007551 0.001011 0.006121 0.008852 
RTADE 0.142757 0.076450 0.043247 0.007547 0.001096 0.001831 0.002794 
LTADE 0.124947 0.052689 0.034212 0.008998 0.001090 0.008960 0.012943 
MLE 0.0420026 0.017779 0.0115895 0.0024205 0.0003438 0.0031073 0.0045127 
OLS 0.0438612 0.0217394 0.0131186 0.0027502 0.0003667 0.0002438 0.0004378 
WLS 0.049366 0.019694 0.012902 0.002673 0.000390 0.003894 0.005668 
CVM 300 0.045063 0.022351 0.013460 0.002806 0.000376 0.002516 0.003655 
ADE 0.043535 0.020087 0.012473 0.002607 0.000358 0.001940 0.002817 
RTADE 0.048705 0.025865 0.014805 0.002523 0.000374 0.003293 0.004774 
LTADE 0.044840 0.019134 0.012446 0.003177 0.000395 0.001065 0.001571 
Table 6: Comparing estimation methods via an application. 
Methods Estimates Test statistics 
a a B h é c* A* 

MLE 4.91315 0.06419 20.1558 178.8356 221.2843 0.04630 0.47186 

OLS 8.09745 0.16768 4.24998 20.18322 30.99778 0.06171 0.63323 

WLS 3.82957 0.14885 15.51752 42.96923 46.92708 0.04931 0.50496 

CVM —7.85575 0.19925 3.94773 18.35556 27.74697 0.06207 0.63341 

ADE 5.14699 0.14422 8.48739 28.97846 39.05812 0.04792 0.49513 

RTADE —1.07411 0.24895 16.68032 40.08727 49.21435 0.10472 0.98303 

LTADE 5.06196 0.20184 5.75072 334.9566 508.5706 0.04701 0.48306 
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Table 7: Comparing estimation methods via an application. 


Methods Estimates Test statistics 


a a i; i é ce | oat 
MLE ~2.78092 0.08558 12.07814 106.42604 | 151.53397 0.05717 | 0.35320 
OLS 3.10846 0.09195 10.59231 30.10565 43.20463 0.06025 | 0.37151 
WLS 3.50419 0.06986 11.09497 80.91018 119.7961 0.05783 | 0.35863 
CVM 3.04688 0.09150 11.23096 44.68476 63.03266 0.05760 | 0.35573 
ADE 3.26377 0.09084 9.70303 53.84903 80.11256 0.05775 | 0.35723 
RTADE ~2.98547 0.08683 13.69396 35.68186 46.07807 0.05772 | 0.35591 
LTADE -3.35132 0.08810 8.58356 83.39752 137.66545 0.05916 | 0.36597 


8. Comparing competitive models 


Two real-life data applications to illustrate the importance and flexibility of the family are presented under 
the Lamax model The fits of the PTLGL are compared with other Lomax extensions shown in Table 8. 
The Ist real-life data set (aircraft windshield consists of 84 aircraft windshield items) represents the data 
of failure times of 84 aircraft windshields. The 2nd real-life Data set (aircraft windshield consists of 
63 aircraft windshield items) represents the data of service times of 63 aircraft windshields. The two data 
sets are considered based on matching/fitting their properties and the plots of the PDF in Figure | (right 
plot). By examining Figure | (the right plot) we see that the PDF of the PTLGL model can be “symmetric” 
and “asymmetric right-skewed” with different shapes. On the other hand, by exploring the two real-life data 
sets, we noted that two densities are asymmetric densities (see Figure 2 (top right plot) and Figure 3 (top 
right plot)). Moreover, the theoretical HRF of the PTLG-G family, including the “asymmetric monotonically 
increasing HRF” shape and the HRF of the two real data sets are “asymmetric monotonically increasing” 
(see Figure | (left plot), Figure 2 (top left plot) and Figure 2 (top left plot)). The two real data were reported by 
Murthy et al. (2004). The “nonparametric Kernel density estimation (KDE)” tool is employed for exploring 
the initial PDF shape. The “normality” is also checked by the p “Quantile-Quantile” (Q-Q) plot. The initial 


Table 8: The competitive models. 


N. Model Abbreviation Author 
1 Special generalized mixture-L SGML Chesneau and Yousof (2021) 
2 Odd log-logistic-L OLLL Elgohari and Yousof (2020) 
3 Reduced OLL-L ROLLL Elgohari and Yousof (2020) 
4 Reduced Burr-Hatke-L RBHL Yousof et al. (2018c) 
5 Transmuted Topp-Leone-L TILL Yousof et al. (2017b) 
6 Reduced TTL-L RTTLL Yousof et al. (2017b) 
7 Gamma-L GamL Cordeiro et al. (2015) 
8 Kumaraswamy-L KumL Lemonte and Cordeiro (2013) 
9 McDonald-L McL Lemonte et al. (2013) 

10 Beta-L BL Lemonte et al. (2013) 

11 Exponentiated-L exp-L Gupta et al. (1998) 

12 L L Lomax (1954) 

13 Proportional reversed hazard rate-L PRHRL New 
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Figure 2: TTT, NKDE, Q-Q and box plot for the Ist data. 


HRF shapes are explored via the “total time in test (TTT)” plot. The “box plot” explores the extreme values. 
Based on Figures 2 and 3 (top left plots), it is shown that the HRFs are “monotonically increasing HRFs” for 
the two data sets. Based on Figures 2 and 3 (top right plots), it is noted that the PDFs are asymmetric functions 
for the two data sets. Based on Figures 2 and 3 (bottom left plots), it is noted that “normality” exists. Based on 
Figures 2 and 3 (bottom right plots) we observe that no extremes are spotted. The following goodness-of-fit 
(G-O-F) test statistics are used for comparing competitive models: the “Akaike information” (AICr); the 
“Consistent-AIC” (CAICr); the “Bayesian-IC” (BICr) and the “Hannan-Quinn-IC” (HQICr). Tables 9 and 11 
give the MLEs and the corresponding standard errors (SEs) for the two real-life datasets. Tables 10 and 12 
list the four G-O-F statistics for the two real-life data sets. Figures 4 and 5 give the Probability-Probability 
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Figure 3: TTT, NKDE, Q-Q and box plot for the 2nd data set. 


(P-P) , Kaplan-Meier Survival (KMS), estimated PDF(E-PDF), estimated CDF (E-CDF) and estimated HRF 
(E-HRF) plots for the two data sets, respectively. Based on Tables 5 and 7, it is noted that the PTLGL model 
gives the lowest values for all G-O-F statistics with AICr = 269.8712, CAICr = 270.6404, BICr = 282.0253 
and HQICr = 274.7570 for the Ist data set, and AICr = 208.582, CAICr = 209.6347, BICr = 218.2977 and 
HQICr = 212.7966 for the 2nd data set among all fitted competitive models. So, it could be selected as the 
best model under these G-O-F criteria. 
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Table 9: MLEs and SEs for the 1st data set. 


Model Estimates 
PTLGL (a,a,8,1,c) 5.4829 0.2484 4.6215 17.199 25.023 
(2.0125) (0.6335) (11.269) (20.515) (41.598) 
KL (4,6, A,c) 2.6150 100.276 5.27710 78.6774 
(0.3822) (120.49) (9.8116) (186.01) 
TTLL (4,6, 4,c) —0.8075 2.47663 (15608) (38628) 
(0.1396) (0.5418) (1602.4) (123.94) 
BL (a,b, A4,c) 3.60360 33.6387 4.83070 118.837 
(0.6187) (63.715) (9.2382) (428.93) 
PRHRL (8, 4,c) 3.73 10° 4.71x107 4.5x10° 
1.01«10° (0.00001) 37.1468 
SGML (4, 4,c) —1.04«107 9.83% 10° 1.18107 
(0.1223) (4843.3) (501.04) 
RTTLL (4,8, 2) —0.84732 5.52057 1.15678 
(0.1001) (1.1848) (0.0959) 
OLLL (4, 4,c) 2.32636 7.17xe8 2.3x10° 
(2.14«107) (1.19xe4) (2.6*10') 
exp-L (4, A,c) 3.62610 20074.5 26257.7 
(0.6236) (2041.8) (99.743) 
GamL (b A,c) 3.58760 52001.4 37029.7 
(0.5133) (7955.0) (81.16) 
ROLLL (b,A) 3.89056 0.57316 
(0.3652) (0.0195) 
RBHL (A,c) 1080175 5136722 
(983309) (232313) 
L(A,c) 51425.44 1317902 
(5933.52) (296.120) 
Table 10: G-O-F statistics for the 1st data set. 
Model AICr BICr CAICr HQICr 
PTLGL 269.8712 282.0253 270.6404 274.757 
OLLL 274.847 282.139 275.147 277.779 
TTLL 279.140 288.863 279.646 283.049 
GamL 282.808 290.136 283.105 285.756 
BL 285.435 295.206 285.935 289.365 
exp-L 288.799 296.127 289.096 291.747 
ROLLL 289.690 294.552 289.839 291.645 
SGML 292.175 299.467 292.475 295.106 
RTTLL 313.962 321.254 314.262 316.893 
PRHRL 331.754 339.046 332.054 334.686 
L 333.977 338.862 334.123 335.942 
RBHL 341.208 346.070 341.356 343.162 
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Table 11: MLEs and SEs for the 2nd data set. 


Model Estimates 
PTLGL (4,4,f,/,c) 2.9426 0.0689 15.3547 17.7850 22.6267 
(1.4316) (0.0575) (12.667) (18.172) (24.430) 
BL (a,b 1c) 1.9218 31.2594 4.9684 169.572 
(0.318) (316.84) (50.528) (339.21) 
KL (4,6, A,c) 1.6691 60.5673 2.56490 65.0640 
(0.257) (86.013) (4.7589) (177.59) 
TTLL (4,6, 4,c) (-0.607) 1.78578 2123.39 4822.79 
(0.2137) (0.4152) (163.92) (200.01) 
RTTLL (4,8, 4) —0.6715 2.74496 1.01238 
(0.18746) (0.6696) (0.1141) 
PRHRL (4, 4,c) 1.59x10° 3.93x107 1.30x10° 
2.01108 0.001« 107! 0.95x 10° 
SGML (6 1,c) —1.04«107 6.45 10° 6.33x 10° 
(4.1107) (3.2110°) (3.8573) 
GamL (4, /,c) 1.9073 35842.433 39197.57 
(0.3213) (6945.074) (151.653) 
OLLL (4, 4,c) 1.66419 6.340% 105 2.01 10° 
(1.8*107) (1.68% 10*) 7.22x 10° 
exp-L (4, A,c) 1.9145 22971.15 32882.0 
(0.348) (3209.53) (162.22) 
RBHL (4,c) 14055522 53203423 
(422.01) (28.5232) 
ROLLL (8, 4) 2.37233 0.69109 
(0.2683) (0.0449) 
Lic) 99269.8 207019.4 
(11864) (301.237) 


Table 12: G-O-F statistics for the 2nd data set. 


Model AICr BICr CAICr HQICr 
PTLGL 208.582 218.2977 209.6347 212.7966 
KL 209.735 218.308 210.425 213.107 
TTLL 212.900 221.472 213.589 216.271 
GamL 211.666 218.096 212.073 214.195 
SGML 211.788 218.218 212.195 214.317 
BL 213.922 222.495 214.612 217.294 
exp-L 213.099 219.529 213.506 215.628 
OLLL 215.808 222.238 216.215 218.337 
PRHRL 224.597 231.027 225.004 227.126 
L 222.598 226.884 222.798 224.283 
ROLLL 225.457 229.744 225.657 227.143 
RTTLL 230.371 236.800 230.778 232.900 
RBHL 229.201 233.487 229.401 230.887 
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Figure 4: EPDF, EHRF, P-P, KMS plots for the Ist data set. 
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Figure 5: EPDF, EHRF, P-P, KMS plots for the 2nd data set. 
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9. Conclusions 


Anew compound G family of distributions called the Poisson Topp Leone generated-G (PTLG-G) family is 
defined and studied. The PTLG-G family is constructed by compounding the Poisson, and the Topp Leone 
generated G families. A special case based on the Lomax model called the Poisson Topp Leone generated 
Lomax (PTLGL) model is studied and analyzed. The density function of the PTLGL model can be “asymmetric 
right-skewed” and “symmetric” with many useful shapes. The hazard rate of the PTLGL model can be 
“constant”, “decreasing”, “upside-down”, “increasing” and “ increasing-constant”. Relevant properties of the 
PTLGL model, including moment of the residual life, ordinary moments, moment of the reversed residual 
life, incomplete moments, probability weighted moments, order statistics and mean deviation, are derived 
and numerically analyzed. Several new bivariate PTLG-G families using the “Clayton copula”, “Farlie- 
Gumbel-Morgenstern copula”, “modified Farlie-Gumbel-Morgenstern copula’, “Ali-Mikhail-Haq copula” 
and “Renyi’s entropy copula” are investigated. Certain characterizations based on two truncated moments 
and the reverse hazard function are presented. We briefly describe seven classical estimation methods: the 
maximum likelihood, Cramér-von-Mises, ordinary least squares, weighted least square, right tail Anderson 
Darling, and left tail Anderson Darling methods used in the estimation process. Monte Carlo simulation 
experiments are performed to compare the performances of the proposed estimation methods for both small 
and large samples. These methods are used in the estimation process of the unknown parameters. Monte Carlo 
simulation experiments are performed to compare the performances of the proposed estimation methods for 
both small and large samples. Two different applications to real-life datasets are presented to illustrate the 
applicability and importance of the PTLG-G family. For the two real datasets: The “initial density shapes” 
are explored by the nonparametric Kernel density function, the “normality condition” is checked by the 
“Quantile-Quantile plot”, the shape of the hazard rates is discovered by the “total time in test” graphical 
tool, the “box plots explore the extremes”. Based on the two applications, the PTLGL distribution gives 
the lowest values for all test statistics with AICr = 269.8712, CAICr = 270.6404, BICr = 282.0253 and 
HQICr = 274.7570 for the failure times data, and AICr = 208.582, CAICr = 209.6347, BICr = 218.2977 and 
HQICr = 212.7966 for the service times data among all fitted competitive models. 

As a future work, we can apply many new useful goodness-of-fit tests for right censored validation such 
as the Nikulin-Rao-Robson goodness-of-fit test and Bagdonavicius-Nikulin goodness-of-fit test as performed 
by Ibrahim et al. (2019), Goual et al. (2019, 2020), Mansour et al. (2020a,b,c,d,e,f), Yadav et al. (2020), 
Goual and Yousof (2020), Aidi et al. (2021) Yadav et al. (2022) and Ibrahim et al. (2022), among others. 
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Chapter 2 


A Novel Family of Continuous Distributions 
Properties, Characterizations, Statistical Modeling and 
Different Estimation Methods 


Haitham M Yousof,'"* M Masoom Ali,” Gauss M Cordeiro,* GG Hamedani* and 
Mohamed Ibrahim? 


1. Introduction 


Statistical literature contains various G families of distributions which were generated either by compounding 
well-known existing G families or by adding one (or more) parameters to the existing classes. These novel 
families were employed for modeling real data in many applied areas such as engineering, insurance, 
demography, medicine, econometrics, biology and environmental sciences; see Cordeiro and de Castro 
(2011) (Kumaraswamy G family), Cordeiro et al. (2014) (the Lomax generator), Afify et al. (2016a) 
(transmuted geometric-G family), Afify et al. (2016b) (complementary geometric transmuted family), Aryal 
and Yousof (2017) (exponentiated generalized Poisson family), Brito et al. (2017) (Topp Leone odd log- 
logistic family), Yousof et al. (2017) (Burr X family), Cordeiro et al. (2018) (Burr XII family), Korkmaz 
et al. (2018) (exponential-Lindley odd log-logistic family) and Karamikabir et al. (2020) (Weibull Topp 
Leone generated family), Hamedani et al. (2021) and Hamedani et al. (2022) (type I quasi Lambert family 
and type II quasi Lambert family) among others. For other useful G families see Hamedani et al. (2017, 2018) 
and Hamedani et al. (2019). 

We propose and study a new family of distributions called the geometric generated Rayleigh (GcGR) 
family with a strong physical motivation. Let gy (x) and Gy(x) denote the probability density function (PDF) 
and cumulative distribution function (CDF) of an arbitrary baseline model with parameter vector V and 
consider the CDF of the generated Rayleigh (GR) family 


Hy Ax) = 1 — exp[-Vpv@) her, p> 0° (1) 

GE (x) 
1-G§ (x) 
baseline random variable (RV) having CDF Gy(x), the CDF of the geometric G (Gc-G) family is defined by, 


where, Vp v(x) = lrer,pso ANd Ky v(x) = dH y(x)/dx is the PDF corresponding to (1). For any arbitrary 
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OG, (x) 


| eR, 6>0. (2) 
1-(1-)Gy (x) 


Foy (x)= 

By combining (1) and (2), we propose and study a new extension of the well-known Gc-G family 
to provide more flexibility to the generated family. The new family has a strong physical motivation as 
given in Section 2. The new family is derived based on expanding the geometric Rayleigh family with 
the generated odd ratio V, (x). After a quick study of its properties, different classical estimation methods 
under uncensored schemes are considered, such as the maximum likelihood (ML), Anderson—Darling (AD), 
ordinary least squares (OLS), Cramér-von Mises (CVM), weighted least squares (WLS), left-tail Anderson— 
Darling (LTAD), and right-tail Anderson—Darling (RTAD) methods. Numerical simulations are performed 
for comparing the estimation methods using different sample sizes for three different combinations of 
parameters. 

In fact, the GcGR family is motivated by its flexibility in applications which is important. By means 
of three applications, we show that the GcGR class provides better fits than many other families. The new 
family could be useful in modeling real data with an “asymmetric monotonically increasing hazard rate 
function (HRF)” as illustrated in Figure 1, Figure 4 and Figure 7 (1st row right panels); the real data which 
has some extreme values as shown in Figure | and Figure 4 (2nd row right and left panels); the real data 
which has no extreme values as shown in Figure 7 (2nd row right and left panels); the real data for which 
its nonparametric Kermel density is asymmetric bimodal and heavy tail as illustrated in Figure 1, Figure 4 
(Ist row left panels); the real data for which its nonparametric Kernel density is symmetric and unimodal as 
illustrated in Figure 7 (1st row left panels); the real data which cannot be fitted by the common theoretical 
distributions such as normal, uniform, exponential, logistic, beta, lognormal and Weibull distributions as 
illustrated in Figure 1, Figure 4 and Figure 7 (3rd row right panels). 

The rest of the paper is organized as follows. In Section 2, we define the GcGR family and give a useful 
representation of its density function. In Section 3, we derive some of its mathematical properties. In Section 4, 
some characterization results are addressed. In Section 5, we present a special model corresponding to the 
baseline Fréchet distribution. Different classical estimation methods under uncensored schemes are addressed 
in Section 6. Numerical simulations are performed for comparing the estimation methods under different 
scenarios in Section 7. In Section 8, we provide three applications to real data to illustrate the flexibility of 
the new family. Finally, some concluding remarks are addressed in Section 9. 


2. The new family 

We use (1) and (2) to construct a new two-parameter family of continuous probability distributions called 

the GcGR family by taking (1) as the baseline CDF of (2). The CDF of the GcGR family can be defined by, 
0-Oexp [Viv (x) | 


F(x) = | 
are 0){1-exp[Vp.v(x) |} 


xeER,O>0,f8>0, (3) 


where, ¢ = (0,8,V) refers to the parameter vector. The new CDF in (3) can be used for presenting a new 
discrete G family for modeling the count data (refer to Aboraya et al. (2020), Ibrahim et al. (2021), Chesneau 
et al. (2021) and Yousof et al. (2021) for more details). Then, the PDF corresponding to (3) is, 


By (GP"([1- GF (TP exp| Vv (x) | 


(1 (1-0) {1 exp v2 vco]}) 


f(x) = 268 


xeR,0>0,f>0. (4) 
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The GcGR family is motivated by the following points. Suppose a system is made up N of independent 
components in series, where N is a RV with a standard geometric distribution and probability mass function 
(PMF) 


p(N= Nn; é) = al- ay Inew and 0€(0,1)» 


where, W= {1,2,. ..}. Suppose that RVs X,, X5,..., X,, represent the lifetimes of the component of the GR 
family. Then, 


Y= min(X), X),. ae X,,) 


represents the time for the first failure with CDF (3) given @ € (0,1). In a similar manner, consider now a 
parallel system with N independent components and suppose that a RV N has a geometric distribution with 
the PMF 


1 n-1 
P(N = ny; ~)= ~ [1-3] laeae and @>2: 


Let X;, X3,..., X,, be the lifetimes of the GR family components. Then, 
T= max (X), X),..., X,) 


represents the lifetime of the system. Therefore, the RV_X follows (4) given 6 > 1. 

Some power series expansions for Equations (3) and (4) can be derived using the concept of 
exponentiated G (Exp-G) family of distributions. Hereafter, for an arbitrary baseline CDF G,(x), we define a 
RV Y, having the Exp-G distribution with power parameter f > 0, say Y~ Exp-G(, V), if its PDF and CDF 
are given by 1,y(x) = fey(x)Gy(xy* and Thy v(x) = G,(x)’, respectively. Using the generalized binomial 
expansion and the power series, the PDF in (4) can be expressed as, 


- 2 a-ay-p GF") (2) 
AOE au) De masa D-Ge@p lai JU) 


Using the Taylor expansion, we can write, 


ie (x)= > Mn The VO) = p[2(k+1)+m]>0° (5) 


k,m=0 


where, 


4-9) (kay j)(-3-2k 


i=0 j=0 


Thus, some mathematical properties of the GcGR family can be obtained simply from the properties of 
the Exp-G family. Equation (5) is the main result of this section. The CDF of the GcGR family can also be 
expressed as a mixture of Exp-G densities. By integrating (5), we obtain the same mixture representation, 


co) 


FD)= Yo May, De yOlpro, (6) 


k,m=0 


where IIx v(x) is the CDF of the Exp-G family with power parameter /* > 0. 
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3. Properties 
The r“ moment of X, say i, y, follows from (5) as 17.7 = E(X") = Vim=o Mim EV jn): 

Henceforth, Y pe denotes the Exp-G distribution with positive power parameter f*. We have 
E(Y fu) = B* ee 4 Sy(x) Gy(x)"" dx, which can be computed numerically in terms of the baseline quantile 
function (QF) Og y(u) = Gy'(u) as E(Y") = p* I} Ogyu)" u”"du. The variance, skewness, and kurtosis 


measures can now be calculated through simple relations. Then, the moment generating function (MGF) 
My ( t) = E(exp( £X)) of X can be derived from Equation (5) as, 


oo 


M,(4)= by Mm Mg 4), 


k,m=0 


where Mj.( 2) is the MGF of Ye. Hence, Mz. (4) can be determined from the Exp-G generating function. The 
s” incomplete moment, say ¢, (4), of X can be expressed from (5) as 


b= [LG dr= YM yy CB, 


k,m=0 
where I‘, (x; Bt) = [rox Tpx(x)dx. The n® moment of the residual life, say m,, (4) = E[(X— 4)" | X> 4], 
n = 1,2,..., uniquely determines F(x). The n” moment of the residual life of X is given by 
1 g 
m,, y(t) = J2(x— t)" dF(x). Then, 
HO Fg ee) 


1 wet 
Myx 4) = R{7) 2 2 Mim (”) ty" Tx"; B*), 


where I?(x"; B*) = 7X" Hpx(x)dx. Then” moment of the reversed residual life, say M,, ( t)=E[(t—X)” | X < 4] for 


1 
t>Oandn=1,2,..., follows as M, ( t) = Fo t —x)" dF(x). Therefore, the n moment of the reversed 


residual life of Y becomes F(t 
] ice] n 
M,(#) = DD Mem Cy (”) #150 BY, 
PFA t) k,m=0 r=0 


where, Ig(x"; B*) = J9.x" ape(x)dx. 


4. Characterizations of the GcGR family 


To understand the behavior of the data obtained through a given process, we need to be able to describe this 
behavior through its approximate probability law. This, however, requires establishing conditions which 
govern the required probability law. In other words, we need to have certain conditions under which we 
may be able to recover the data probability law. So, the characterization of a distribution is important in 
applied sciences, where an investigator is vitally interested in finding out if the model follows the selected 
distribution. Therefore, the investigator relies on conditions under which the model would follow a specified 
distribution. A probability distribution can be characterized in different directions one of which is based on 
the truncated moments. This type of characterization pioneered by Galambos and Kotz (1978) and followed 
by other authors such as Kotz and Shanbhag (1980), Glanzel et al. (1984), Glanzel (1987), Glanzel and 
Hamedani (2001) and Kim and Jeon (2013), to name a few. For example, Kim and Jeon (2013) proposed a 
credibility theory based on the truncation of the loss data to estimate conditional mean loss for a given risk 
function. It should also be mentioned that characterization results are mathematically challenging. In this 
section, we present certain characterizations of the GcGR distribution based on: (i) conditional expectation 
(truncated moment) of certain functions of a RV and (ii) reverse hazard function. 
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4.1 Characterizations based on two truncated moments 


This subsection is devoted to the characterizations of the GcGR distribution in terms of a simple relationship 
between two truncated moments. We will recall the Theorem of Glanzel (1987). As shown in Glanzel (1990), 
this characterization is stable in the sense of weak convergence. The first characterization given below can 
also be employed when the CDF does not have a closed form. 


Proposition 4.1.1. Let X : Q— R acontinuous RV and let 


qi (x) = [1-1 - 9) — expt-Viig @)})P 


and 
2 (x) = q(x) exp{-Vj., (x)} forx ER. 


for 
Then X has PDF (4) if and only if the function € defined in Theorem 1 in is of the form 


Proof. If X has PDF, then, 


(1 — F(x) )E[g)(X) | X2 x] = 6 exp{-Vj, (x)} forx € R, 
and 

(1 FO))Ela) |X2 x] =$ exp|2V}, (O}. x ER, 
and hence 

&x) = , exp{-V3 (x)}.x ER. 
We also have 
Eox)q (8) ~ an6e) =F 1G) exp {Vig} <0, forx € R. 

Conversely, if € is of the above form, then 


s(x) = &'(x)q, (x) 
§(x)q, (x) — 95 (x) 


Now, according to Theorem 1, X has density (4). 


= 28g, (x)Ge"(- GP (aT, xe R. 


Corollary 4.1.1. Suppose X is a continuous RV. Let g,(x) be as in Proposition 4.1.1. Then_X has density (4) 
if and only if there exist functions g,(x) and ¢(x) defined in Theorem 1| for which the following first order 
differential equation holds 
5'(x) G(X) 
5 (x) (4) — (2) 


Corollary 4.1.2. The differential equation in Corollary 4.1.1 has the following general solution 


= 285 (x)GeP" (x)[1- G (Or eer, 


-[ 262,962" (l-GLoor 


E(x) = exp} 9(2)] | 
* xexp-7j o(X)}(,@) gaa) +D 


where D is a constant. A set of functions satisfying the above differential equation is given in 
Proposition 4.1.1 with D = 0. Clearly, there are other triplets (¢,(x), go(x), ¢(x)) satisfying the conditions of 
Theorem 1. 
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4.2 Characterization in terms of the reverse hazard function 


The reverse hazard function, (x), of a twice differentiable distribution function, F, is defined as, 


rAx) = Ax) x € support of F; 


F(x) 
We present a characterization of the GcGR distribution in terms of the reverse hazard function. 


Proposition 4.2.1. Let ¥ : Q— R bea continuous RV. The RV_X has PDF (4) if and only if its reverse 

hazard function 7{x) satisfies the following differential equation, 

(28-1g, (x) 
G,(x) 


re) = GPx) LANMa),x ER, 


VEX 


with boundary condition where, 
-GP"()1 (I-exp-7},(0)})_ 
[1 -(1-6)(l-exp-V;, ()})| 


Ap") = gh (x) 


5. Special case 


The Fréchet (Fr) model is one of the most important distributions in modeling extreme values. The Fr 
model was originally proposed by Fréchet (1927). It has many applications, for example, accelerated life 
testing, earthquakes, floods, wind speed, horse racing, rainfall, queues in supermarkets, and sea waves (see 
Von Mises (1964) and Kotz and Johnson (1992)). One can find more details about the Fr model in the 
literature, for example, Nadarajah and Kotz (2003) investigated the exponentiated Fr distribution. Moreover, 
Jahanshahi et al. (2019) defined and applied a new version of the Fr distribution, called the Burr X Fréchet 
(BX-Fr) model for relief times and survival times data, Krishna et al. (2013) proposed some applications of 
the Marshall—Olkin Fr (MO-Fr) distribution, Al-Babtain et al. (2020) investigated a new three parameter Fr 
model called the generalized odd generalized exponential Fréchet (GOFE-Fr) with mathematical properties 
and applications, among others. 

A RV X is said to have the Fr distribution if its PDF and CDF are given by gay.a,(x) = @) a? x" 


a2 a2 
exp (2) | loo» ANd Gay a(x) = exp -() | | s9»Where a, > 0 is a scale parameter and a, > 0 is a shape 
x x 


parameter. Based on (3), the CDF of the geometric generated Rayleigh Fréchet (GcGR-Fr) model can be 


defined by 
ay" ° 
6-0 -|eo| (2: Hf 
x 


(2) = 2 x20,0>0,8>0? 
1-(1-6)| 1-exp fe o( 2) in 
Xx 


where ¢ = (0, f, a, a) refers to the parameters vector. For the GcGR-Fr model, we obtain the following 
results: 


' r r : ae r 
Lx = E(X ) 3 a >: MnP aa fi _ =| ies >r? 
1 


k,m=0 


. Gye ee 
Pix( =i YM nB* i p-2(4) lis 
2 


k,m=0 
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My, x #) “a9 >. ym k nie carreras [-2.(4) | an 


k,m=0 r=0 2 
oar way 
a ces YM. ” } "an peer -2(4) Ja, > 2, 
k,m=0 r=0 a, 
where I(0)|,.9 = it exp(-z)dz and 7(6,p) refers to the lower incomplete gamma _ function 
: -1)* “ 
W(O.P)|¢540,-1,-2,..) = Joz°! exp(-z)dz = YF p** ee The function I(6,p)|, > 9 = J, (6") exp(-z)dz and 


I(6,p) =I(6) — y(6,p) refer to the upper incomplete gamma function. 


6. Classical estimation under uncensored scheme 


We discuss seven methods to estimate the parameters of the GcGR family which can be implemented using the 
“Adequacy Model” script in “R” software, which provides a general meta-heuristic optimization technique 
for maximizing or minimizing an arbitrary objective function (Marinho et al. (2019)). 


6.1 The ML method 


The maximum likelihood estimates (MLEs) enjoy desirable properties and can be used when constructing 
confidence intervals. The normal approximation for these estimates in large sample theory is easily handled 
either analytically or numerically. Here, we determine the MLEs of the parameters of the new family of 
distributions from complete samples only. Let ¢ = (6,8,V)’ be the p x 1 parameter vector. To obtain the MLE 
of ¢, the log-likelihood function can be expressed as 


&(<) = flog(2) +#log(B) +4 l0g(0) + ¥ log gy(x,) + 2B- DY bee, (x,) 


i=0 


& 
-3)og[ 1-68 (x) |- os? -29 gz, 
i=0 i=0 i=0 
GE (x) 
where, s, = ; GP ; and z,;= {1 —(1 —9)[1 — exp(-s7)]}. The function @(¢) can be maximized either directly 
Vv i 
by using the R (optim function), SAS (PROC NLMIXED) or Ox program (sub-routine MaxBFGS) or by 
solving the nonlinear likelihood equations obtained by differentiating (¢). For interval estimation of the 
model parameters, we require the observed information matrix. Under standard regularity conditions when 
n— ©, the distribution of ¢ can be approximated by a multivariate normal distribution to construct approximate 
confidence intervals for the parameters. 


6.2 The CVM method 


The Cramér-von Mises estimates (CVMEs) of 6, £ and V are obtained by minimizing the following expression 
with respect to these parameters 


Lt wu 7 
CVM = # +>] AGpa—ctty | 
i=l 


where, C;;,) = 


0-Oexp |-Viw O44] )| 
=| 1—(1— 6) {exp| - Viva} 
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6.3 OLS method 


Let F’(x;; 4) denote the CDF of the new family and let x5) 4 <x, 44 < +++ <X1,¢q be the k ordered observations. 
The ordinary least squares estimates (OLSEs) are obtained by minimizing 


- DB 6 Bexp| V5.4 (%,4)) | 
OLSE(¢) pe l (1 a) I exp| Viv Cua) | 


[2] 
CUR) [> 


i 
h+1 
6.4 WLS method 


The weighted least square estimates (WLSEs) are obtained by minimizing the function WLSE (¢) with respect 
to 0, B and V, 


2 
where, Ci) = 


‘ 2 
WLSE(¢) = >» Cie) [F, 7 Oti.4) — Cia ] , 


i 


where, cl =[(1+ A (2+ AV + & - A). 


> “Gh) 


& 
=l 


6.5 The AD method 


The Anderson-Darling estimates (ADEs) are obtained by minimizing the function, 
& — 
ADE(G) =-#-#" (21-1) {log 4,45) + log Fe istsee)}- 
i=l 


where, F OLitttaap = 1 —F Opiti+scay 


6.6 The RTAD method 
The right-tail Anderson—Darling estimates (RTADEs) are determined by minimizing 


1 LY 5. = 
RT sp)(6) = 54-2) Fo ua) gy 2, 2i-D [los Fs sive) } 
i=l i=l 


6.7 The LTAD method 
The left-tail Anderson—Darling estimates (LTADEs) are obtained by minimizing 


k 


3 % 1 . 
LT spe)(6) = 54+ 2) F Gay) -5 2,24 —I)log F.(%j,4)) 
i=] 


i=l 
7. Comparing estimation methods 


7.1 A simulation study for comparing estimation methods 


A numerical simulation is performed to compare the classical estimation methods under the GcGR-Fr model. 
The simulation study is based on N = 1,000 generated data from the GcGR-Fr model with n = 50,100,150, 
and 300, and under the following three scenarios: 

Scenario I: (0 = 0.5, 8 = 0.6, a, = 0.9, a = 0.8), 

Scenario II: (9 = 0.7, 2 = 0.7, a; = 0.7, a = 0.7), 

Scenario III: (6 = 0.9, 6 = 0.8, a, = 0.5, a = 0.3) 
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The estimates are compared in terms of the root of the mean-standard error (RMSE,.)). The numbers 
in Table 1 indicate that the RMSE, for each parameter tends to as increases, which means incidence of the 


consistency property. 


Table 1: Simulation results for comparing methods. 


Scenario I 
Results > RMSE 
Methods | 6 B a OH 
ML 0.05475 0.00190 0.00668 0.02412 
OLS 0.05072 0.00216 0.00760 0.04411 
WLS 0.05523 0.00206 0.00726 0.04263 
CVM 20 0.05045 0.00215 0.00753 0.05006 
AD 0.04892 0.00196 0.00690 0.02474 
RTAD 0.04642 0.00184 0.00645 0.03007 
LTAD 0.06649 0.00266 0.00929 0.05512 
ML 0.01625 0.00077 0.00272 0.00786 
OLS 0.01641 0.00091 0.00319 0.01643 
WLS 0.01796 0.00083 0.00294 0.01413 
CVM 50 0.01639 0.00090 0.00318 0.01741 
AD 0.01619 0.00083 0.00292 0.01004 
RTAD 0.01734 0.00080 0.00282 0.01232 
LTAD 0.01778 0.00105 0.00370 0.01821 
ML 0.00787 0.000352 0.00123 0.00366 
OLS 0.00755 0.00044 0.00155 0.00761 
WLS 0.00823 0.00040 0.00140 0.00527 
CVM 100 0.00754 0.00044 0.00154 0.00788 
AD 0.00734 0.00039 0.00140 0.00469 
RTAD 0.00793 0.00038 0.00134 0.00570 
LTAD 0.00782 0.00051 0.00179 0.00811 
ML 0.00386 0.00018 0.00063 0.00176 
OLS 0.00408 0.00024 0.00085 0.00418 
WLS 0.00438 0.00020 0.00071 0.00237 
CVM 200 0.00408 0.00024 0.00085 0.00421 
AD 0.00404 0.00022 0.00077 0.00248 
RTAD 0.00424 0.00020 0.00071 0.00259 
LTAD 0.00439 0.00029 0.00102 0.00451 
Scenario II 
ML 0.10719 0.00289 0.00592 0.01545 
OLS 0.10205 0.00355 0.00723 0.02306 
WLS 0.10174 0.00328 0.00673 0.03052 
CVM 20 0.10601 0.00364 0.00743 0.02443 
AD 0.10162 0.00329 0.00672 0.01638 
RTAD 0.10917 0.00322 0.00659 0.02587 
LTAD 0.11015 0.00401 0.00816 0.03624 


Table I contd. ... 
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Scenario II 


Results > RMSE 
Methods | 6 B a 
ML 0.03390 0.00118 0.00240 0.00518 
OLS 0.03442 0.00150 0.00307 0.00901 
WLS 0.03563 0.00123 0.00251 0.00721 
CVM 50 0.03504 0.00146 0.00297 0.00815 
AD 0.03423 0.00131 0.00267 0.00607 
RTAD 0.03472 0.00122 0.00250 0.00837 
LTAD 0.03982 0.00173 0.00352 0.01241 
ML 0.01526 0.00058 0.00119 0.00241 
OLS 0.01393 0.00064 0.00132 0.00416 
WLS 0.01726 0.00066 0.00134 0.00335 
CVM 100 0.01622 0.00074 0.00150 0.00407 
AD 0.03423 0.00131 0.00267 0.00607 
RTAD 0.01710 0.00064 0.00131 0.00431 
LTAD 0.01725 0.00085 0.00174 0.00611 
ML 0.00786 0.00029 0.00059 0.00123 
OLS 0.00737 0.00035 0.00072 0.00188 
WLS 0.00860 0.00031 0.00064 0.00167 
CVM 200 0.00746 0.00036 0.00073 0.00198 
AD 0.00727 0.00032 0.00065 0.00150 
RTAD 0.00798 0.00032 0.00065 0.00224 
LTAD 0.00786 0.00041 0.00084 0.00288 
Scenario III 
ML 0.17067 0.00403 0.01824 0.00235 
OLS 0.15867 0.00504 0.02388 0.00420 
WLS 0.17851 0.00473 0.02240 0.00653 
CVM 20 0.16833 0.00531 0.02534 0.00409 
AD 0.16039 0.00473 0.02271 0.00240 
RTAD 0.16203 0.00454 0.02189 0.00465 
LTAD 0.20638 0.00623 0.02941 0.00479 
ML 0.05045 0.00152 0.00661 0.00086 
OLS 0.05758 0.00201 0.00876 0.00145 
WLS 0.05479 0.00185 0.00857 0.00143 
CVM 50 0.05262 0.00199 0.00899 0.00147 
AD 0.04995 0.00184 0.00852 0.00094 
RTAD 0.05634 0.00178 0.00796 0.00170 
LTAD 0.05582 0.00227 0.01028 0.00174 


Table 1 contd. ... 
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Scenario III 
Results > RMSE 
Methods | 6 B Oo G 
ML 0.02503 0.00085 0.00376 0.00037 
OLS 0.02532 0.00104 0.00471 0.00075 
WLS 0.03098 0.00095 0.00418 0.00057 
CVM 100 0.02476 0.00096 0.00416 0.00073 
AD 0.02431 0.00091 0.00406 0.00050 
RTAD 0.02610 0.00086 0.00376 0.00075 
LTAD 0.02718 0.00111 0.00486 0.00087 
ML 0.01208 0.00040 0.00172 0.00019 
OLS 0.01282 0.00052 0.00227 0.00035 
WLS 0.01447 0.00046 0.00201 0.00024 
CVM 200 0.01242 0.00050 0.00219 0.00035 
AD 0.01251 0.00047 0.00206 0.00025 
RTAD 0.01308 0.00045 0.00196 0.00037 
LTAD 0.01317 0.00057 0.00248 0.00043 


7.2 Applications for comparing estimation methods 


In order to compare the estimation methods, we consider the Cramér-von Mises (C*) statistic, the Anderson- 
Darling (A*) statistic, the Kolmogorov-Smirnov (KS) statistic and its corresponding p-value (P,,). These four 
statistics are widely used to determine how closely a specific CDF fits the empirical distribution of a given 
data. The following data are considered: The Ist uncensored data set consists of 100 observations on breaking 
stress of carbon fibers (in Gba) given by Nichols and Padgett (2006). The 2nd data set refers to the strengths 
of glass fibers reported by Smith and Naylor (1987). The 3rd data set called “Wingo data” represents a 
complete sample from a clinical trial described as relief times (in hours) for 50 arthritic patients. Table 2 gives 
the results for all estimation methods using these three real data sets. 

The numbers in Table 2 indicate that the ML method is the best method for estimating the unknown 
parameters for the Ist data set with C* = 0.05923, A* = 0.43370, KS = 0.05832 and p-value = 0.88574. For 
the 2nd data set, the ML method is the best for estimating the unknown parameters with C* = 0.04481 and 
p-value = 0.82587. However, the OLS method is the best with KS = 0.06959 and C* = 0.04473. For the 3rd 
data set, the LTAD method is the best method for estimating the unknown parameters with C* = 0.04740 and 
A* = 0.40493. However, the OLS method is the best with KS = 0.06406 and p-value = 0.06406. 


8. Comparing the competitive models 


For illustrating the wide flexibility of the GcGR-Fr model, we consider the previous statistics for model 
comparison. Table 3 reports some competitive models. 

Exploring real data can be done either numerically or graphically or with both techniques. We will 
consider many graphical techniques such as the skewness-kurtosis plot (or the Cullen and Frey plot) for 
exploring initial fit to the theoretical distributions such as normal, uniform, exponential, logistic, beta, 
lognormal and Weibull . Bootstrapping is applied and plotted for more accuracy. The Cullen and Frey plot 
compares distributions in the space of squared skewness, kurtosis which provides a summary of the properties 
of a distribution. So, many other graphical techniques are considered such as the “nonparametric Kernel 
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Table 2: Comparing estimation methods. 


For the 1“ data set 


Results > Estimates Statistics 
Methods | 6 B Oo OD Ct A* KS p-value 
ML 910.24494 0.05233 105.84380 0.98874 0.05923 0.43370 0.05832 0.88574 
OLS 215.77420 0.08418 34.39883 1.12237 0.05961 0.44788 0.07278 0.66462 
WLS 480.03974 0.09176 41.35466 | 1.07116 0.06032 0.43905 0.06115 0.84866 
CVM 570.58613 0.09413 48.92754 | 1.01985 0.05918 0.43658 0.06405 0.80656 
ADE 652.03622 0.15174 30.77710 | 1.02497 0.05953 0.43567 0.05987 0.86597 
RTAD 372.49960 0.10113 43.95148 | 1.01228 0.05836 0.44189 0.07851 0.56863 
LTAD 1840.81247 0.20254 40.22368 | 0.89669 0.05791 0.43099 0.06059 0.85643 
For the 2nd data set 
ML 839.20873 0.14677 18.62822 1.25341 0.04481 0.34819 0.07906 0.82587 
OLS 573042862 0.38244 8.35374 1.24981 0.04473 0.35496 0.06959 0.92046 
WLS 650.61291 0.08583 45.75462 | 1.06944 0.04569 0.37974 0.09692 0.59487 
CVM 434.59422 0.16560 14.05591 | 1.31446 0.04503 0.35521 0.07253 0.89471 
ADE 1232.04744 0.02387 123.55314 | 1.13980 0.04433 0.35320 0.06973 0.91929 
RTAD 538.16030 0.08137 36.83841 | 1.14559 0.04535 0.36853 0.10302 0.51565 
LTAD 1283.60868 0.20206 18.12401 | 1.16273 0.04438 0.35001 0.07511 0.86934 
For the 3rd data set 

ML 0.37760 0.00839 21.01780 1.21289 0.04850 0.40140 0.08191 0.89060 
OLS 0.41411 0.01257 20.65118 1.10861 0.04774 0.40555 0.06406 0.98645 
WLS 0.38930 0.00786 20.74389 1.23507 0.04861 0.40168 0.09496 0.75806 


CVM 0.44939 0.01095 20.25422 | 1.15889 0.04747 0.40348 0.06913 0.97067 
ADE 0.39734 0.00096 22.72119 | 1.15070 0.04790 0.40287 0.07341 0.95043 


RTAD 0.37284 0.00916 23.51166 1.15051 0.04826 0.40318 0.07312 0.95201 
LTAD 0.48905 0.01033 20.70115 1.17315 0.04740 0.40493 0.06861 0.97266 


Table 3: Some competitive models. 


Competitive models Abbreviation Author(s) 

Fréchet Fr Fréchet (1927) 
Exponentiated-Fréchet E-Fr Nadarajah and Kotz (2003) 
Beta- Fréchet Beta-Fr Barreto-Souza et al. (2011) 
Marshal-Olkin-Fréchet MO-Fr Krishna et al. (2013) 
Transmuted-Fréchet T-Fr Mahmoud and Mandouh (2013) 
Kumaraswamy-Fréchet Kum-Fr Mead and Abd-Eltawab (2014) 
McDonald- Fréchet Mc-Fr Shahbaz et al. (2012) 

odd log-logistic-inverse Rayleigh OLL-IR - 

odd log-logistic exponentiated-Fréchet OLLE-Fr - 

odd log-logistic exponentiated IR OLLE-IR - 

generalized odd log-logistic- IR GOLL-IR - 


A Novel Family of Continuous Distributions 43 


density estimation (NKDE)” approach for exploring initial density shape, the “Quantile-Quantile (Q-Q)” plot 
for exploring “normality” of the data, the “total time in test (TTT)” plot for exploring the initial shape of the 
empirical HRFs, the “box plot” and scattergrams for exploring the extremes. 


8.1 Comparing the competitive models under stress data 


The Ist uncensored data set consists of 100 observations on the breaking stress of carbon fibers (in Gba) 
given by Nichols and Padgett (2006). Figure 1 gives the NKDE plot (1st row left panel), the TTT plot 
(Ist row right panel), box plot (2nd row left panel), the Q-Q plot (2nd row right panel), scattergram plot 
(3rd row left panel), and the skewness-kurtosis plot (3rd row right panel). Based on Figure | (Ist row left 
panel), it is noted that the breaking stress of carbon fibers data is asymmetric bimodal and right heavy 
tail. Based on Figure | (1st row right panel), it is clear that the HRF of the current data is monotonically 
increasing. Based on Figure | (2nd row left panel and 2nd row right panel), it is noted that this data includes 
some extreme values. Based on Figure | (3rd row right panel), it is noted that the current data cannot be 
explained by the theoretical distributions such as normal, uniform, exponential, logistic, beta, lognormal and 
Weibull distributions. 

The statistics C*, A*, K-S and. for all fitted models are presented in Table 4. The MLEs and corresponding 
standard errors (SEs) are reported in Table 5. From Table 4, the GcGR-Fr model gives the lowest values 
C* = 0.0612, A* = 0.4467, K-S = 0.05887 and P,, = 0.8789 as compared to the other models. Therefore, the 
GcGR-Fr can be chosen as the best model. Figure 2 gives the estimated PDF and CDF. Figure 3 gives the 
Probability-Probability (P-P) plot and estimated HRF for the current data. From Figure 2 and Figure 3, we 
note that the new GcGR-Fr model provides adequate fits to the empirical functions. 


8.2 Comparing the competitive models with glass fibers data 


The 2nd data set refers to the strengths of glass fibers as given by Smith and Naylor (1987). Figure 4 gives 
the NKDE plot (1st row left panel), the TTT plot (1st row right panel), box plot (2nd row left panel), the 
Q-Q plot (2nd row right panel), scattergram plot (3rd row left panel), and the skewness-kurtosis plot (3rd 
row right panel). Figure 4 (1st row left panel) indicates that the glass fibers data is asymmetric bimodal and 
right heavy tail. Figure 4 (1st row right panel) indicates that the HRF of the glass fibers data is monotonically 


Table 4: C*, A*, K-S and for the breaking stress of carbon fibers data. 


Criteria Goodness of fit criteria 

Model c* A* K-S P, 
GceGR-Fr 0.0612 0.4467 0.05887 0.8789 
OB-Fr 0.0664 0.4706 0.0630 0.8220 
OLLE-Fr 0.1203 0.9639 0.5561 < 0.0001 
OLLE-IR 0.1553 1.2120 0.6550 < 0.0001 
OLL-IR 0.1553 1.2120 0.6550 < 0.0001 
Fr 0.1090 0.7657 0.0874 0.4282 
Kum-Fr 0.0812 0.6217 0.0759 0.6118 
E-Fr 0.1091 0.7658 0.0874 0.4287 
Beta-Fr 0.0809 0.6207 0.0757 0.6147 
T-Fr 0.0871 0.6209 0.0782 0.5734 
MO-Fr 0.0886 0.6142 0.0763 0.5168 
Mc-Fr 0.1333 1.0608 0.0807 0.5332 


aos G Families of Probability Distributions: Theory and Practices 


Table 5: MLEs and SEs for the breaking stress of carbon fibers data. 


Estimates Estimates 
Model A - ‘ —~ — 
7) B c ay Q 
GcGR-Fr 250.215 0.02987 74.1258 1.17293 
(393.53) (0.0175) (68.922) (0.2228) 
GcGR-Fr 5.1954 0.5990 1.0404 1.2324 
(0.001) (0.032) (0.044) (0.003) 
OLLE-Fr 0.1351 3.7216 0.9296 21.319 
(0.011) (0.0034) (0.0033) (0.0034) 
OLLE-IR 0.49460 0.06743 1.74262 
(0.0414) (0.7195) (9.3007) 
OLL-IR 0.49459 0.45242 
0.04135 0.03869 
Fr 1.3968 4.3724 
(0.0336) (0.3278) 
Kum-Fr 0.8489 1.6239 1.6341 3.4208 
(16.083) (0.6979) (9.049) (0.7635) 
E-Fr 0.9395 1.4169 0.9395 
(3.543) (2.568) (0.3278) 
Beta-Fr 0.7346 1.5830 1.6684 3.5112 
(1.5290) (0.7132) (0.7662) (0.9683) 
T-Fr —0.7166 1.2656 4.7121 
(0.2616) (0.0579) (0.3657) 
MO-Fr 0.0033 6.2296 1.2419 
(0.0009) (1.0134) (0.1181) 
Mc-Fr 0.8503 44.423 19.859 0.0203 46.974 
(0.1353) (25.100) (6.706) (0.0060) (21.871) 


increasing. Figure 4 (2nd row left panel and 2nd row right panel) shows that the glass fibers data includes 
some extreme values. Figure 4 (3rd row right panel) indicates that the glass fibers data cannot be explained 
by the theoretical distributions such as normal, uniform, exponential, logistic, beta, lognormal and Weibull 
distributions. 

The statistics C*, A*, K-S and for all fitted models are reported in Table 6. The MLEs and corresponding 
SEs are given in Table 7. From Table 6, the GcGR-Fr model gives the lowest values C* = 0.11304, 
A* = 0.89752, K-S = 0.12348 and = 0.2691 as compared to other models. Therefore, the GcGR-Fr distribution 
can be chosen as the best model. Figure 5 gives the estimated PDF and CDF. Figure 6 gives the P-P plot and 
estimated HRF for the glass fibers data. Based on Figure 5 and Figure 6, it is clear that the GcGR-Fr model 
provides adequate fits to the empirical functions. 


8.3 Comparing the competitive models with the relief times data 


The 3rd data set is called “Wingo data” and represents a complete sample from a clinical trial described as 
relief times (in hours) for 50 arthritic patients. Figure 7 gives the NKDE plot (1st row left panel), the TTT 
plot (1st row right panel), box plot (2nd row left panel), the Q-Q plot (2nd row right panel), scattergram plot 
(3rd row left panel), and the skewness-kurtosis plot (3rd row right panel). Figure 7 (1st row left panel) 
indicates that the relief times data can be considered as symmetric data. Based on Figure 7 (1st row right 
panel), it is noted that the HRF of these data is monotonically increasing. Based on Figure 7 (2nd row left 
panel and 2nd row right panel), it is clear that the relief times do not include any extreme values. Based 
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Figure 1: Graphical description for the breaking stress of carbon fibers data. 
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Figure 3: P-P plot and estimated HRF for the breaking stress of carbon fibers data. 
Table 6: C*, A*; K-S and for the glass fibers data. 
Criteria — Goodness of fit criteria 
Model | ce A* K-S P. 
GcGR-Fr 0.11304 0.89752 0.12348 0.2691 
OLLE-Fr 0.10487 0.83250 0.55196 < 0.0001 
OLLE-IR 0.15020 1.14697 0.67949 < 0.0001 
OLL-IR 0.15021 1.14697 0.67951 < 0.0001 
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Figure 4: Graphical description for the glass fibers data. 
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Table 7: MLEs and SEs for the glass fibers data. 


Estimates > Estimates 
Model | a n P — — 
0 B c ay O 
GcGR-Fr 14.89934 0.00567 53.55753 1.5947538 
(6.2264) (0.00085) (11.9327) (0.097052) 
OLLE-Fr 0.1449 0.00879 (0.000) 1.2997 (0.000) 24.878 (0.000) 
(0.0129) 
OLLE-IR 0.5025 (0.0529) 0.0716 (1.13062) 1.7048 (13.47) 
OLL-IR 0.50251 0.45599 
0.052946 0.048652 
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Figure 6: P-P plot and estimated HRF for the glass fibers data. 
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Figure 7: Graphical description for the relief times data. 
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Table 8: C*, A*, K-S and P, for the relief times data. 


Criteria Goodness of fit criteria 
Model ce A® K-S 
GcGR-Fr 0.0485 0.4014 0.08191 0.8906 
OB-Fr 0.0490 0.4208 0.09124 0.7994 
GOLL-IR 0.1955 1.3498 0.11008 0.5797 
OLLE-Fr 0.1577 1.0988 0.53498 < 0.0001 
Fr 0.3233 2.0301 0.15062 0.2066 
E-Fr 0.3233 2.0301 0.15061 0.2064 
Beta-Fr 0.3611 2.5131 0.14334 0.3601 
T-Fr 0.2823 1.8152 0.13701 0.3045 


Table 9: MLEs and SEs for the relief times data. 


Estimates > Estimates 
Model | A A F —a — 
7) p c ay Oy 
GcGR-Fr 0.377602 (0.60784) 0.008388 21.01781 1.2128934 
(0.00152) (21.46850) (0.456084) 
OB-Fr 17.7905 (0.0001) 6.9955 (4.0355) 0.12686 (0.0002) | 0.17843 (0.0004) 
GOLL-IR 1.96132 0.1112 1.41232 (0.005) 
(0.2340) (0.001) 
OLLE-Fr 0.0669 (0.0076) 0.00459 0.3558 (0.0047) 32.561 
(0.0028) (0.006) 
Fr 0.4859 (0.0227) 3.2078 (0.3263) 
E-Fr 0.9047 0.5013 (3.2444) 3.2077 (0.3263) 
(18.784) 
Beta-Fr 4.015 (0.111) 1.3349 (0.147) 2.0022 (0.321) 0.87017 (0.0033) 
T-Fr 0.5816 0.4400 (0.0290) 3.4974 (0.3527) 
(0.2787) 


on Figure 7 (3rd row right panel), it is noted that the relief times cannot be explained from theoretical 
distributions such as normal, uniform, exponential, logistic, beta, lognormal and Weibull . 

The statistics C*, A*, K-S and for all fitted models are reported in Table 8. The MLEs and corresponding 
SEs are given in Table 9. From Table 8, the GcGR-Fr model gives the lowest values C* = 0.0485, 
A* = 0.4014, K-S = 0.08191 and = 0.8906. Therefore, the GcGR-Fr may be chosen as the best model. 
Figure 8 displays the estimated PDF and CDF. Figure 9 gives the P-P plot and estimated HRF for the relief 
times data. Based on Figure 8 and Figure 9, we note that the new GcGR-Fr model provides adequate fits to 
the empirical functions. 


9. Conclusions 


This paper presents a novel two-parameter G family of distributions. Relevant statistical properties such as 
the ordinary moments, incomplete moments and generating function are derived. Special attention is devoted 
to the standard Fréchet baseline model. Different classical estimation methods under uncensored schemes are 
considered such as the maximum likelihood, Anderson—Darling, ordinary least squares, Cramér-von Mises, 
weighted least squares, left-tail Anderson—Darling, and right-tail Anderson—Darling. Numerical simulations 
are performed for comparing the estimation methods. Moreover, all methods of estimation are compared by 
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Figure 9: P-P plot and estimated HRF for the relief times data. 


means of three real data sets. The usefulness and flexibility of the purposed family are illustrated by means 
of three applications to real data. The new family proved its superiority against many well-known G families 
as shown below: 


I. In modeling the breaking stress of carbon fibers, the new family is better than the odd Burr G family, 
the odd log-logistic G family, the odd log-logistic exponentiated G family, the transmuted G family, 
the Kumaraswamy G family, exponentiated G family, Beta G family, the McDonald G family and the 
Marshall-Olkin G family under the Cramér-von Mises statistic, the Anderson-Darling statistic, the 
Kolmogorov-Smirnov test statistic, and its corresponding p-value. 


II. In modeling the glass fibers, the purposed family is better than the odd log-logistic G family and the 
odd log-logistic exponentiated G family under the Cramér-von Mises statistic, the Anderson-Darling 
statistic, the Kolmogorov-Smirnov test statistic, and its corresponding p-value. 
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III. In modeling the relief times, the new family is better than the odd Burr G family, the generalized odd log- 
logistic G family, the odd log-logistic exponentiated G family, exponentiated G family, Beta G family 
and the transmuted G family under the Cramér-von Mises statistic, the Anderson-Darling statistic, the 
Kolmogorov-Smirnov test statistic, and its corresponding p-value. 


As a future work, we will consider many new useful goodness-of-fit tests for right censored validation 
such as the Nikulin-Rao-Robson goodness-of-fit test and Bagdonavitius-Nikulin goodness-of-fit test as 
performed by Ibrahim et al. (2019), Goual et al. (2019, 2020), Mansour et al. (2020a-—f), Yadav et al. (2020 
and 2022), Goual and Yousof (2020), Aidi et al. (2021) and Ibrahim et al. (2022), among others. Some useful 
reliability studies based on multicomponent stress-strength and the remaining stress-strength concepts can 
be presented (Rasekhi et al. (2020), Saber et al. (2022a,b), Saber and Yousof (2022)). Some new acceptance 
sampling plans based on the complementary geometric Weibull-G family or on some special members can be 
presented in separate articles (see Ahmed and Yousof (2022) and Ahmed et al. (2022)). 
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Chapter 3 


On the use of Copulas to Construct 
Univariate Generalized Families of 
Continuous Distributions 
Christophe Chesneau'* and Haitham M Yousof? 


1. Introduction 


Many scientific disciplines are interested in modeling random multivariate events. However, the derivation 
of multivariate probability distributions that accurately model the variables at play is difficult. Copulas are 
a useful concept for dealing with this problem. As a first definition, a bivariate (or 2-dimensional) copula 
is a bivariate function C(u,v), (u,v) € [0,1]?, satisfying, the following properties: For any (u,v) € [0,1], 
C(u,0) = C(0,v) = 0, C(u,/) =u, C(1,v) = v, and for any (2), w, Vv), V2) € [0,1]*, such that wu, <w and v; < v9, 
C(uz, V2) — Clu, v4) — C(tty, V2) + C(uy, vi) 2 0. 

The major result on the concept of copulas is the Sklar theorem established in Sklar (1959). This 
concept is quite flexible; over time, a variety of copulas have been proposed. In particular, there are those 
of the families of Archimedean copulas, elliptical copulas and extreme value copulas. They contain various 
copulas, depending on one or more parameters, which have found a place of choice in many applications. In 
addition, some copulas of interest are independent of these families and have been at the origin of important 
innovations in multivariate modeling. It is the case with the famous Farlie-Gumbel-Morgensten (FGM) 
copula. For the theoretical aspects, we may refer to Nelsen (2006), Yong-Quan (2008), Georges et al. (2001), 
Coles et al. (1999), Bekrizadeh et al. (2015), Bekrizadeh and Jamshidi (2017), Bekrizadeh et al. (2012), 
Trivedi and Zimmer (2005), Chesneau (2021a) and Chesneau (2021b). Diverse application evidence can be 
found in References Frees and Valdez (1998), Georges et al. (2001), Kazianka and Pilz (2009), Zhang et al. 
(2011), Thompson and Kilgore (2011) and Shiau et al. (2011). 

In a completely different branch of probability and statistics, there is a high demand for new and original 
families of univariate distributions. These families are used for the analysis of data sets in various applied 
fields, as well as the construction of various models, such as regression models, clustering models, and so 
on. Through various mathematical functions, the compounding, integral, and mixing techniques are the most 
commonly used for defining such families. In this regard, we may refer the reader to references, Bekrizadeh 
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et al. (2012), Bekrizadeh et al. (2015), Brito (2017), Chesneau (2021), Casella and Berger (1990), Chesneau 
and Yousof (2021), Coles et al. (1999), Cordeiro et al. (2018), Jin and Shitan (2014), Korkmaz et al. (2018a,b 
and 2020), Fischer and Klein (2007), Frees and Valdez (1998), Domma (2009), Karamikabir et al. (2020), 
Kazianka and Pilz (2009), Chesneau (2021), Eugene et al. (2002), Georges (2001) and Bekrizadeh and 
Jamshidi (2017), Aryal and Yousof (2017), Hamedani et al. (2017, 2018 and 2019), Nascimento et al. (2019), 
Yousof et al. (2017, 2018 and 2020), Merovei et al. (2017 and 2020), Alizadeh et al. (2020a,b), Altun et al. 
(2022) and El-Morshedy et al. (2021), among others. 

In this article, we explore a new research approach consisting of using the flexible analytical properties of 
copulas to create new general families of univariate distributions. These families are called Copula-G families. 
As a primary remark, they have the feature of possibly depending on two different baseline distributions, 
parametric or not, and one independent tuning parameter, or more, inherent to the definition of the copula. 
We thus transpose some flexible dependence features of the copula to a new perspective of modeling in the 
univariate case. We develop this idea in a comprehensive manner. Some theoretical results are given in full 
generality. Then, by selecting some families of particular interest, as well as interesting baselines, we present 
two special members and determine some of its practical properties by a graphical approach. We highlight the 
fact that it can be applied to analyze real-life data, among other members of the family. 

The plan of the paper is divided into the following sections. Section 2 lists several famous copulas. In 
Section 3, we develop our idea and show how these copulas can serve as flexible generators of new families 
of univariate distributions. Some comments conclude the paper in Section 4. 


2. A list of known copulas 


In the literature, there is a plethora of copulas presenting different features and properties. The main aim 
of this section is to present the general forms of some copulas depending on only one parameter, denoted 
by 9, often encountered in applications. The majority of them are cited in Nelsen (2006). We refer to this 
referenced table for the possible values of 6, which differ from one copula to another; it will be precise later 
only for the copulas taken into account in our applications. 


1. The Archimedean copula | of [Nelsen (2006) Table 4.1]: 
C(u, v) = [(uw? + vv? - 1), 7”. 
2. The Archimedean copula 2 of [Nelsen (2006) Table 4.1]: 
Clu, v) = (1 — [C1 —u)’+ (1 —v)"}"),.. 
3. The Archimedean copula 3 of [Nelsen (2006) Table 4.1]: 


uv 
1-@(01 -u)(1-v)’ 
4. The Archimedean copula 4 of [Nelsen (2006) Table 4.1]: 

C(u, v) = exp (-[C log u)’ + ~ log v)*]"). 
5. The Archimedean copula 5 of [Nelsen (2006) Table 4.1]: 


(e™ = 1\(e” = 1) 
ef] ; 


C(u, v) 


Clu, v) =F log 14 


6. The Archimedean copula 6 of [Nelsen (2006) Table 4.1]: 


C(u,v) = 1—-[(1 —u)°+(1 — v)?- (1 — uw)? (1 — vy]. 
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7. 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


The Archimedean copula 7 of [Nelsen (2006) Table 4.1]: 


C(u,v) = (6uv + (1 — A)\(u + v—1)),. 


. The Archimedean copula 8 of [Nelsen (2006) Table 4.1]: 


@uv—(1 —u)(1 — v) 
P-(6-1P d-u)(1-v)/+ 


C(u, v) = 


. The Archimedean copula 9 of [Nelsen (2006) Table 4.1]: 


C(u,v) = uv exp[—O(log u)(log v)]. 
The Archimedean copula 10 of [Nelsen (2006) Table 4.1]: 
uv 
[1 +d -—w)d —v’y]? 
The Archimedean copula 11 of [Nelsen (2006) Table 4.1]: 
Clu,v) = [(w! v?— 201 = w\(1 =v"), J. 
The Archimedean copula 12 of [Nelsen (2006) Table 4.1]: 
C(u,v) = (1 + [art = 1)? + t= 12). 
The Archimedean copula 13 of [Nelsen (2006) Table 4.1]: 


C(u,v) 


C(u,v) = exp(1 —[(1 —log u)’ + (1 — log v)’ — 1]""). 
The Archimedean copula 14 of [Nelsen (2006) Table 4.1]: 
C(u,v) = (1 + [wr = 1)8 + 8 = 1)? 
The Archimedean copula 15 of [Nelsen (2006) Table 4.1]: 
Cluv) = (= [1 — 09) + =), 
The FGM copula: 
C(u,v) = uv[1 + 0 —u)(1 — v)]. 
The simple polynomial-sine (SPS) copula: 
C(u,v) = uv + a sin (zu) sin(zv). 
The power cosine copula (see Chesneau (202 1a)): 


C(u,v) = uv [ + acos| £1 Jeos{r)] 
2 2 


The ratio extended FGM copula (see Chesneau (2021b)): 


1 
1+ 6uv 


C(u,v) = uv 1s a-na-v } 


(1) 


(2) 


We recall that (w,v)€[0,1]° in all cases, that (a), = max(a,0) and that the domain of definition for 0 can 
change from one copula to another. Most of these copulas are involved in applications to describe dependence 
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structures in two conjoint random phenomena, such as those illustrated in the references mentioned in the 
introductory section. All the copulas above are applicable to our idea of the Copula-G family, an idea which 
is developed in the next section. 


3. Copula-G families 


We now describe a new generator of distribution strategies based on the notion of copulas, which yields the 
Copula-G families. 


3.1 Main idea 


Starting with a copula C(u,v), the idea of creating a new generalized family is as follows. Let X and Y be 
two random variables defined on the domain [0,1] with the copula C(w,v) as a joint cumulative distribution 
function (CDF), and W(x) and H(x) as the CDFs of continuous univariate distributions. Let us introduce the 
univariate random variable 


Z = max(W" (X), H! (Y)). (3) 
Then the CDF of Z is given by 
F(x) = C(W(x), H(x)), x ER. 


Thus defined, F(x) is any univariate CDF, depending on C(w,v), W(x) and H(x). The Copula-G families 
are defined by F(x). They depend on the three functions C(u,v), W(x) and H(x), and their choices drive the 
modeling ability of the Copula-G family from the modeling sense. We thus link the univariate CDFs, W(x) 
and 7/{(x) through a copula strategy governed by C(u,v). 

The probability density function (PDF) related to F(x) is obtained via the differentiation of F(x) with 
respect to x. After some algebra, it is obtained as 


fx) = w(x) - C(W(x), H(x)) + h(x) & C(W(x), H(x)), x ER, 


where W(x) and h(x) denote the PDFs related to W(x) and H(x), respectively. We emphasize that 
OC(W (x), H(x))/Ou must be read as OC(u,v)/Ou composed by (u,v) = W(x), H(x)). 

The hazard rate function (HRF) is given as r(x) = f(x)/[1 — F(x)], x € R, which can be expressed via 
C(u,v), W(x) and H(x) as 


1 
= x con x xX x ce x xX x 
0) eon. Fey [70 Z COME). He) + hoy & coven, ayy}, eR 


The reversed hazard rate function is given as r. (x) = f(x)/F(x), x € R, which can be expressed as, 


MAX) = : Xx a Xx x x oO x x x 
1) = So Fy [eZ Coven. Hay + He) Z COVE), Hoo)}. x ER 


These functions are the basis for a deep reliability study for given baseline distributions. 
The quantile function is theoretically defined as Q(x) = F'! (x), which has no closed-form in our general 
setting. For some simple selected baseline distributions, it can certainly be expressed simply in some cases. 
The use of two baseline distributions may be too arbitrary a choice in practice. To simplify the situation 
and fix the idea, a possible simple choice is W(x) = 7H(x). In this case and if the copula C(w,v) is symmetric, 
the main functions are reduced to: 


F(x) = CW(x), H(x)), 
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fx) = 20x) - C(W(x), H(x)), 


2ur(x)OC(W(x), Wx))/Ou 
CW (x), H@)) 


These functions are quite manageable from the analytical viewpoint. 


r(x) = 


3.2 General theory 


This part is devoted to some theoretical results of the Copula-G families, under some realistic assumptions. 
First, assume that C(w,v) can be expanded into a power series expansion as 
+00 
C(u,v) = » a,,uiv*. 
Jk=0 

Such an expansion can be derived from the direct definition of the used copula (with possibly a lot of 
vanishing coefficients producing finite sums), or from the Taylor theorem. Then, under this assumption, some 
series expansion on important aspects of the Copula-G family can be given. In particular, it is clear that 


+00 


FQ) = >) a, WEY Hey 


j.k=0 
Upon the condition that the derivative under the sums is mathematically valid, we immediately get 


+00 


A) = Ya LW HG)! + KhOWOY HO). 


j.k=0 
Some standard moment measures related to the Copula-G family can be derived from this formula. In 
particular, with the derivation under the sign sum condition, the r” moment of a random variable X with the 
CDF in Equation (3) is given as, 


+400 


my )=EXY= D414) + Ha 
where, — 


Oe =7 Fx WO Hea} de 
and 


MeO) =K [- x” h(x)W(x)i Hx) dx. 


The following approximation result is thus valid, provided the integer NV chosen is large enough: 


m(N~ D4 le 0) +741 
j.k=0 
The mean m,(1) = E(X), variance V(X), skewness S(X) and kurtosis K(X) can be derived based on well- 
known relationships. More generally, the r” incomplete moment is given as, 


+00 


my (r;1) = » Gj LO OO + ye OI, 
jk=0 
where. 


G(r: =F J wOWO)" HOY de 
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and 
Ma iD =k fe Ax)Weay Hay dx. 


In particular, the first incomplete moment is involved in the definition of key probabilistic functions, 
such as the mean residual life and mean inactivity time. In economics, life insurance, health sciences, 
demographics, product quality control, and product technology, these functions are useful (Lai and Xie 2006). 


3.3 Examples of Copula-G family and members 


Two Copula-G families are described below. 


I. Asa first example, we can consider the FGM copula defined in Equation (1) with 6 € [—1,1]. Then, the 
following CDF is the result of the proposed strategy: 


F(x) = W(x) HL + 41 — WX) (1 — Hd), x € R. (4) 
The PDF related to F(x) is given by, 

f() = HOW WOO H@) (W(x) — WX) + 1] 

+ W(x) A(x)[O W(x) (Hx) — Hx) + 1], x ER, 


where, W(x) = 1 — W(x) and H(x) = 1 — H(x). We recall that that A(x) and w(x) denote the PDFs related to 
H(x) and W(x), respectively. The family defined by this CDF F(x) is called the FGM-G family. The HRF is 
obtained as, 


= H(x)e (x)[0 Hx) W (x) — W(x) + 1 + WE) A@)LA WE) (HO) — HQ) + 1 ah 


1-We@) H@x)[1 + 8d —- We) d — HO))] 


II. As asecond example, we can consider the SPS copula defined in Equation (2) with # € [-1,1]. Then, the 
following CDF is the result of the proposed strategy: 


F(x) = W(x) H(x) + 0 sin [nW(x)]sin[XH(x)], x € R. (5) 
The PDF related to F(x) is given by, 
f(x) =9 - (w(x) cos[aW(x)] sin[aH(x)] + h(x) sin [nWV(x)]cos[xH@)]) 
+ H(x)w(x) + WO)h(w), x ER. 
The related family is called the SPS-G family. The corresponding HRF is indicated as, 
7 (0/n)(w (x) cos[z(x)] sin[xH(x)] + h(x) sin [ZW] cos[AH(a)]) + H(x)tw (x) + Wxyh(x) 


WX 


1— W(x) H(x) + (O47) sin [AW)] sin[xH(x)] 


For the choices of W(x) and H(x), they mainly depend on the context in which we want to apply the 
copula-G family. In the field of reliability, one can think of considering the two main lifetime distributions in 
literature: The exponential and Lindley distributions. 
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Thus, for W(x), we consider the CDF of the exponential distribution with parameter a > 0 given as, 
Wx) =1-e°, x > 0, 
and W(x) = 0 for x < 0, with a > 0. The corresponding PDF is specified by 
w(x) = ae", x > 0, 


and w(x) = 0 for x < 0. For H(x), we consider the CDF of the Lindley distribution with parameter £ > 0 
given by, 


Hx) =1- [is B | e?*,x>0, 
Bri 


and H(x) = 0 for x < 0, with 6 > 0. The corresponding PDF is specified by 
om 

pti 

and A(x) = 0 for x <0. The Lindley distribution is defined as a special mixture of the exponential distribution 

with parameter / and gamma distribution with parameters 2 and /. These distributions are lifetime distributions 

which differ a bit in modeling properties. In particular, the HRF of the exponential distribution is constant, 


whereas the HRF of the Lindley distribution is not. For details on the Lindley distribution, we refer to Tomy 
(2018). With these baseline distributions, the two following two-parameter distributions can be introduced. 


I. Asa first example, based on the FGM-G family described by the CDF in Equation (4), and the exponential 
and Lindley distributions, we consider the following CDF: 


F(x) =(1 e|t [1s # sje [tvae(1+ B nen boro 
Bri Bri 


and F(x) = 0 for x < 0, with 6 € [-1,1], a> 0 and f > 0. We call this three-parameter model as the FGM-EL 
distribution. The related PDF is given by 


soft as}o ow [oor Bo s]omar ner) 


+(l-e*) ane {ae |a[ts z sJen-nfeifxro 
Bri 


Bt 


A(x) = 


(1+x)e?*, x > 0, 


and f(x) = 0 for x <0. 


II. Asa first example, based on the SPS-G family described by the CDF in Equation (5), and the exponential 
and Lindley distributions, we consider the following CDF: 


x)= e™ B xle* 
F(x)=(1 if Gea | 


+0—sin(ne-™) sin m\ 1+ B ele” |.xeo, 
1 Bri 
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and F(x) = 0 for x < 0, with 6 € [1,1], a>0 and £ > 0. We call this three-parameter model as the SPS-EL 
distribution. The related PDF is given by 


ae “cos(me “")sin c fi + 
B 


# nen 
+1 


f(x) = -6— 
ree ne Minne os a1 B aaa 


Bri pt 
+fi-(r Bialem lao ro-e Fane ss > 0, 


pt 
and f (x) = 0 for x <0. 

Table 1 presents a numerical analysis of moment-type measures, m,(1), V(X), S(X) and K(X), of the 
FGM-EL distribution. 

Based on the numerical result given in Table 1, it is noted that m,(1) decreases as 0 increases and increases 
as a and f increase. On the other hand, S(X) is positive and can range in the interval (1.991305,66.15859), 
and the spread for its K(X) is ranging from 8.190794 to 5785.361 for the extreme case, so the distribution is 
mainly leptokurtic. If data from a lifetime-type phenomenon is available, the maximum likelihood method 
can be used to calculate the parameters of the FGM-EL and SPS-EL distributions. Then, by substituting the 
obtained estimates into the CDF or PDF, we get the estimated CDF or PDF, respectively. We can visualize 
their fits as a suitable graphical representation of the data, such as the empirical CDF for the estimated CDF 
and the shape of the histogram for the estimated PDF. 


Table 1: m,(1), V(X), S(X) and K(X) of the FGM-EL distribution for various values of the parameters. 


a a B my(1) V(x) S(X) K(X) 
0.99 0.5 50 0.01459327 0.00046747 1.991305 8.190794 
0.50 0.01231556 0.00039155 2.299100 10.03958 

0 0.00999 137 0.00030340 2.707796 13.08657 
0.50 0.00766718 0.00020445 3.250578 18.50450 
0.90 0.00580783 0.00011751 3.623670 25.06420 
0.05 0.001 5 0.00022747 0.00010259 66.15859 5785.361 

0.15 0.03208547 0.01303870 5.293313 40.01863 

0.75 0.12675900 0.03360885 2.296001 10.31271 

1 0.15412670 0.03402075 2.045883 9.017326 

2: 0.22020870 0.02162654 2.608773 10.51440 

3 0.24663160 0.00727922 12.34179 26.00328 

0.85 0.5 0.1 0.07437125 0.3514299 10.36599 135.0803 
0.5 0.21458330 0.4795370 4.118848 23.51379 

10 0.06303644 0.0103301 2.172377 9.068581 

50 0.01394249 0.0004468 2.071928 8.640574 

100 0.00704930 0.0001124 2.054335 8.609646 


4. Conclusion 


This paper is devoted to a new idea for generating a family of flexible distributions. The goal is to apply 
the well-known properties of copulas and multidimensional flexibility to the univariate situation to create 
new families of univariate distributions. These families can depend on two baseline distributions. Some 
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theoretical results are provided, with discussions. Finally, some of the families are defined, and some members 
of interest are highlighted, to show how the proposed methodology can be used for modeling purposes. This 
work opens the horizons for the creation of a plethora of Copula-G families that can inspire statisticians all 
over the world. 
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Chapter 4 


A Family of Continuous Probability 
Distributions 


Theory, Characterizations, Properties and 


Different Copulas 


Mohammad Mehdi Saber,! GG Hamedani,* Haitham M Yousof,*** Nadeem Shafique Butt,* 
Basma Ahmed* and Mohamed Ibrahim® 


1. Introduction 


Statistical literature contains various G families of distributions which were generated either by compounding 
well-known existing G families or by adding one (or more) parameters to the existing classes. These novel 
families were employed for modeling real data in many applied areas such as engineering, insurance, 
demography, medicine, econometrics, biology, environmental sciences, and others; refer to Aryal and Yousof 
(2017) for exponentiated generalized Poisson-G family, Brito et al. (2017) for the Topp Leone odd log- 
logistic-G family, Yousof et al. (2017) for the Burr type X-G family, Cordeiro et al. (2018) for the Burr 
XII-G family, Korkmaz et al. (2018a and 2018b) for the exponential-Lindley odd log-logistic-G family and 
the Marshall—Olkin generalized G Poisson family of distributions, Karamikabir et al. (2020) for the Weibull 
Topp Leone generated-G family, Yousof et al. for the extended odd Fréchet-G family, Abouelmagd et al. 
(2019a and 2019b) for the Poisson Burr X-G family and the Topp-Leone Poisson-G family, Nascimento et al. 
(2019) for the odd Nadarajah-Haghighi-G family, Merovci et al. (2017 and 2020) for the exponentiated 
transmuted-G family and the Poisson Topp Leone-G family, Korkmaz et al. (2020) for the Hjorth’s IDB 
generator of distributions, Alizadeh et al. (2020a and 2020b) for flexible Weibull generated-G family of 
distributions and the transmuted odd log-logistic-G family, Hamedani et al. (2017, 2018, 2019, 2021) the 
type I general exponential class of distributions, the new extended-G family of continuous distributions, the 
type II general exponential class of distributions and the type I quasi-Lambert family, Altun et al. (2021) for 
the Gudermannian generated family of distributions, Chesneau and Yousof (2021) for the special generalized 
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mixture class of probabilistic models and El-Morshedy et al. (2021) for the Poisson generalized exponential-G 
family, among others. 

Let g,{x) and G,{x) denote the density and cumulative distribution functions of the baseline model with 
the parameter vector V and consider the Weibull CDF, 


1 
F x) =1l—exp| -—x 2 || x 0, 
Bo sb ) el B al > 


3. 


with positive parameters /, and /;. Based on this density and using the argument, 


1 
Ox) = ——, 
G,(x)-1 
Bourguignon et al. (2014) defined the CDF of their Weibull-G class by, 
1 
Ay pv (x) = ev[ or 09] | > ,63>0.and xER- (1) 
3 
The Weibull-G density function is given by, 
= , Gy (x) 7 1 A2 
hp, pyv (*) B, &y (x) G, (x)2*! exp : Oy (x) |e, .f4>0 and xeR* (2) 


where, G(x) = | — Gx). For a baseline random variable with probability density function (PDF) MgB, Vx) 
and CDF H,,, Br, Ax), the complementary geometric-G (CGc-G) family is defined by the CDF, 


BG, (x) 
F 7 x) = —_—S_ Be P p) 
wal i BG, (x) last, and xeR (3) 
and the PDF is given by, 
B Sy (x) 
Sp, V (x) = = la--A, and xeR? (4) 


[1-2G,() | 


where, /, > 0. In this paper, we propose and study a new extension of the CGc-G family to provide more 
flexibility to the generated family. We construct a new generator called the complementary geometric 
Weibull-G (CGcW-G) family by taking the Weibull-G CDF in (1) as the baseline CDF in equations (3) and 
(4). Further, we give a comprehensive description of the mathematical properties of the new family. In fact, 
the CGcW-G family is motivated by its flexibility in applications which has importance. 


2. The new family 
The CDF of the CGcW-G family is defined by, 


1 
B, ‘ 3 ev[- oF | 
B; (5) 


—_ 1 | 61 By Ba >Oand xeR * 
1 B, en B ori} 
3 


where, ® = (f,, f>, £3,V) is the vector of parameters for the baseline G{Ax). The corresponding PDF is 
given by, 


Fy (x)= 
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lxyG, Gy? ex - a2 ao) 
8) (x)G, (x) p-| J of 


3 


2 | 4.2 >.B3 >0 and xeR " 
cor apt sero 


For f, = 1, the CGcW-G family reduces to the Weibull-G family (Bourguignon et al. (2014)). For 
fh, =f3= 1, the CGcW-G family reduces to the one parameter Weibull-G family (Bourguignon et al. (2014)). 
For f, = | and £, = 2, the CGcW-G family reduces to the Rayleigh-G family (Bourguignon et al. (2014)). For 
fh, = f>= 1, the CGcW-G family reduces to the odd exponential-G family (Bourguignon et al. (2014)). For 
f3= 1, the CGcW-G family reduces to the two parameter CGcW-G family. 
Using the Taylor and generalized binomial expansions, the PDF in (6) can be expressed as, 


So (x) = £2, 8, (6) 


to (x) = De ml (x) | By Pa >0 and xeR? (7) 


k,m=0 
where, £3 = f, (k + 1) + m and 
ey 


km BBB," (1 + rl 


x YD G+DB-D! [i kas’ 


i=0 j=0 


—fB,(k +1) - 


m 


and z,(x) = Vey, (x)GAx)"" is the Exp-G PDF with power parameter V > 0. Thus, several mathematical 
properties of the CGcW-G family can be obtained from those of the Exp-G family. Equation (7) is the main 
result of this section. The CDF of the CGcW-G family can also be expressed as a mixture of the Exp-G 
densities. By integrating (7), we obtain the same mixture representation 


Fog(x) = Do Wem tT (2) py py,fy>0 and 08> (8) 


k,m=0 


where J7/(x) is the CDF of the Exp-G family with the power parameter V. 


3. Characterizations of the CGcW-G distribution 


Characterization of a distribution is important in applied sciences, where an investigator is vitally interested 
to find out if their model follows the selected distribution. Therefore, the investigator relies on conditions 
under which their model would follow a specified distribution. A probability distribution can be characterized 
in different directions one of which is based on the truncated moments. It should also be mentioned that 
characterization results are mathematically challenging and elegant. In this section, we present certain 
characterizations of the CGcW-G distribution based on: (1) conditional expectation (truncated moment) of a 
certain function of a random variable and (ii) reverse hazard function. 


3.1 Characterizations based on a simple relationship between two truncated moments 


This subsection presents characterizations of the CGcW-G distribution in terms of a simple relationship 
between two truncated moments. We employ a Theorem by Glanzel (1987) given in Appendix A. As shown 
in Glanzel (1990), this characterization is stable in the sense of weak convergence. The first characterization 
given below can also be employed when the CDF does not have a closed form. 
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Proposition 3.1.1. Let XY : Q — R be a continuous random variable and let 


2 


_ l 7 
Y, (x) =| 1+ 8, 41-ex, -—oO” x 
(x) pa ov zB, |} 


¥,@)=Y, er|-2 Or 09] forx € R. 


and 


Then X has PDF (6) if and only if the function ¢ defined in Theorem | is of the form 
1 1 
¥x) = —expi-— O” (x . 
E(x) 2 »| B, Vv ( | ler 
Proof. If X has PDF (6), then, 
1 
(—-F(x))E[Y,(X)| X 2 x] = f, exp rz QF | [sepes 
3 


and 
2 


(1 F(@) BLY, (X) | X 2 x]= xh en 3 


QO” | Le; 


and hence, 
= 1 1 pa 
g(x) = rep|-2f o9| ner" 
We also have, 


S(x)V,(x)— V3 (4) = -5Y, (x)exp a7 ol o9| < Ole 


3 
Conversely, if €(x) is of the above form, then 


5x) V(x) 
(x) V(x) — V5 (x) 


Now, according to Theorem 1, X has density (6) 


Ss = 66; 2(@1G,G))” 1-G, Gy |pen, 


Corollary 3.1.1. Let_X be a continuous random variable and Y (x) be as in Proposition 3.1.1. The PDF of 
X is (6) if and only if there exist functions Y (x) and ¢(x) defined in Theorem 1 for which the following first 
order differential equation holds: 


5(x) VQ) 
(x) V(x) — V(x) 


says = B,B;'g, (IG, (~)?“1- G, OT” er. 
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Corollary 3.1.2. The differential equation in Corollary 3.1.1 has the following general solution: 


é(x)= ont op} 


3 


-| B,Bs' 8 (IG, CO "1-G, (ayy 


? 


x exp + OP 09] [Y,Qor Y,(2)+D 


where D is a constant. A set of functions satisfying the above differential equation is given in Proposition 
3.1.1 with D = 0. Clearly, there are other triplets (Y (x), Y>(x), €(x)) satisfying the conditions of Theorem 1. 


3.2 Characterization based on reverse hazard function 


The reverse hazard function, rp, of a twice differentiable distribution function, F, is defined as, 


rp(xX) = ~ , x € support of F. 


In this subsection we present a characterization of the CGcW-G distribution in terms of the reverse 
hazard function. 


Proposition 3.2.1. Let Y : Q — R be a continuous random variable. The random variable XY has PDF (6) if 
and only if its reverse hazard function 7;-(x) satisfies the following differential equation, 


(r-V8 , 4 
Gx) . 


By (x)A—-G, (x) "ep 2 OP «| 


Iplrers 
= ere 


r(x) =0 for £,> 1. 


r(x) - 


-l~fo-l 
BBCP (x) 


with boundary condition /im 


X—-00) 


4. Properties 


In this section, we study some general properties of the CGcW-G family of distributions. 


4.1 General properties 


The r th moment of X, say «4/7, follows from (7) as, 


Hy = E(X")= y Og, mE (Yes (9) 


k,m=0 


Henceforth, Y, Bo denotes the Exp-G distribution with power parameter {3. For £; > 0, and we have, 


EW) = Bf xg (G(x) ade, 
2 —00 = = 
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which can be computed numerically in terms of the baseline quantile function (qf) Og ) (uv) = Gy (u) ‘as, 
1 * 
EY) = B; | Og y(uy" udu. 
2 0 ia 


The variance, skewness, and kurtosis measures can now be calculated. Then, the MGF M, (1) = E(exp(tX)) 
of can be derived from Equation (7) as, 


My()= YY Orn! oO, 


k,m=0 


where M, (f) is the MGF of Y,. 
Hence, M, (t) can be determined from the Exp-G generating function. The m” central moment of X, say 
M,, is given by, 


n 


My =E(X- Wy, y)n= 2 ct > ym (—1)"” ua £0", ). 


r=0 k,m=0 
The n descending factorial moment of X (for 7 = 1,2....) is, 


n 


My = ELX(X-1)x...x(X-n4D] = eis, De = BCX"), 


j=0 


1| d/ 
where, s(n,j) = 2 Ge ) is the Stirling number of the first kind. The “incomplete moment, say a, y(d), 
Tl) de’ 4 ; 
of X can be expressed from (10) as, 


O, (t= [2 x* fp (x) ax = > im ie XT (x)dk. 


k,m=0 


The mean deviations about the mean [by = E(|X — y/|)] and about the median [by= E(|X — M))] of X are 
given by, 


buy _y) = 21x) Fuy— 201 (“1x) 


and 


byM) = Hiya 201 (M), 
respectively, where wu’; y= E(X), M= Median(X) = o(5) is the median, F(;_y) is easily calculated and w, y(¢) 
is the first incomplete moment given by (11) with s = 1. Now, the general equation for @, y(¢) can be derived 
from @, y (0) as, 

Diy (t) = py Di nd op (x), 


where, V(x) = / ‘_xh,(x) dx is the first incomplete moment of the Exp-G distribution. This equations for a, U) 
can be applied to construct Bonferroni and Lorenz curves defined for a given probability z by B(z) = w,(q)/ 
(mu } y) and L(z) = ©,(9)/u}_y respectively, where wu, y= E(X) and g = O(z) is the qf of X at z. 

4.2 Probability weighted moments 


The (s,r)" probability weighted moments of X following the CGcW-G family of distribution, say p,,, is 
formally defined by, 


p., = E{X°F (XY } = [is x° Fy (x)! fp (de. 
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From Equation (6) and (8), we can write, 


Too (x) Fy (x) 2 kan (x) 
where, 


m 


co BB DEF B,I (B+) -1 
‘BGI 


? se (-1)/ nee 
io B [B(k+l4+m]\ ? J 


Finally, the (s,r)" PWM of X can be obtained from an infinite linear combination of Exp-G moments 
given by, 


cs) 


Por = DEC» EW.) 


k,m=0 


4.3 Order statistics 


Order statistics make their appearance in many areas of statistical theory and practice. Let X),...,X,, be a 
random sample from the CGcW-G family of distributions. The PDF of the th order statistic, say _X,.,, can be 


—Oi:n 


written as, 
- fo(x) 3 fei 
Fin (*) Bun ayo ma(-l)’ ie be (x) (10) 
Using (5) and (6), 
we get, 
fo (X)Fy (x) = Dehn (x), (11) 


where (x) is the Exp-G density with power parameter k and 


jes li" B(k+1)-1 
tem = BB; a k\w ay ———__[(l +1 )B;"] C m 


2S (-1)"B, (-(ititl)\(a4 j+i-l 
La Bet +m] oh e 


Substituting (13) into Equation (12), the PDF of X;.,, can be expressed as, 
= (-1)’t,,,  (n-i 
wn (X) = et ip ————— H «(X), 
fal)= DODO aaa 7 a (12) 
h,w= j= 


Then, the density function of the CGcW-G order statistics is a mixture of Exp-G densities. Based on the 
last Equation, we note that the properties of X;.,, follow from the properties of Y,,,. For example, the moments 


of X;., can be expressed as 


q\— : site ~ ae ( I)’ tym aes 7 
Bye yay eB el ‘\en, (Y".), (13) 


hw=0 j=0 
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5. Estimation and inference 


Several approaches for parameter estimation were proposed in the literature but the maximum likelihood 
method is the most commonly employed. The maximum likelihood estimators (MLEs) enjoy desirable 
properties and can be used when constructing confidence intervals with test statistics. The normal 
approximation for these estimators in large sample theory is easily handled either analytically or numerically. 
So, we consider the estimation of the unknown parameters for this family from complete samples only by 
maximum likelihood. Here, we determine the MLEs of the parameters of the new family of distributions from 
complete samples only. Let be a random sample from the CGcW-G family with parameters /,, £5, 8, and V. 
Then, the log-likelihood function for @, say f = €(@), is given by, 


C=nlog B, + nlogB, +nlogB,' + Yslog By (x;)+ (2 -1)>) log G, (x;) 
i=0 i=0 


n 


~(B, + yy slog G, (x;) -Sirs? 25 2 log z,, 
i=0 i=0 


i=0 
where, 


s;= GAx))/Gx)) 


= 1 
z.=l 1) 1-ex ——s : 
| ‘3 | B. | 


The log-likelihood function can be maximized either directly by using the R (optim function), SAS 
(PROC NLMIXED) or Ox program (sub-routine MaxBFGS) or by solving the nonlinear likelihood equations 
obtained by differentiating (17). The score vector components are given by, 


n “ w, 
| anes oy api 
mee aie 


i 


Up, = 2+ Y logG a) - YlogGr(x,)— D9? lows, -29.™ 
i=0 i=0 ee 


2 “F=0 2; 


5 Bs p ! | 1 es 
U, =np,-2 si? exp s/? || l—exp S| | 
ame oF B; B; 
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and 


and 
> gy (%) 4G y(%) 4G, (x) 7 B yd; 
Uy = So He) = th)? — _ _ 2) 28-2): 
7 2 ach ae 2 G(x)” 2 Gr (x) >» Es 
where, 
i ogy (x%) _, 0G, (x) 
_p (Xx) = —=—.,G,_, (x,) = —=—__., 
&,y (x) W, v(x WV, 


DP, = [G, (x,)G, (x,)+G, (x,)G, (x, IG, (x,)?,w, =1- ev pat } 
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n= Fiteo( Le ars 
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and 


3 


d, = Fibs, sPeo| } 


Setting the nonlinear system of equations Ug, = Ug, = Ug; = Uy, = 0 and solving them simultaneously 
yields the MLEs. These equations cannot be solved analytically, and statistical software can be used to 
solve them numerically using iterative methods such as the Newton-Raphson type algorithms. For interval 
estimation of the model parameters, we require the observed information matrix whose elements are easily 
derived. Under standard regularity conditions when n — ©, the distribution ® of the estimated can be 
approximated by a multivariate normal distribution to construct approximate confidence intervals for the 
parameters. Here, J(@) is the total observed information matrix evaluated at . The method of the re-sampling 
bootstrap can be used for correcting the biases of the MLEs of the model parameters. Interval estimates 
may also be obtained using the bootstrap percentile method. Likelihood ratio tests can be performed for the 
proposed family of distributions in the usual way. 


6. Copulas 


For modeling of the bivariate real data sets, we shall derive some new bivariate CGCW-G (Bv-CGCW-G) 
type distributions using “Farlie-Gumbel-Morgenstern copula” (FGMC) copula, modified FGMC, “Clayton 
copula”, “Renyi’s entropy copula (REC)” and “Ali-Mikhail-Haq copula (AMHC)”. The multivariate 
CGCW-G (Mv-CGCW-G) type can be easily derived based on the Clayton copula. However, future works 
may be allocated to study these new models. For more recent applications of some probability models see 
Al-babtain et al. (2020), Salah et al. (2020), Elgohari and Yousof (2020a and 2020b), Ali et al. (2021a and 
2021b), Shehata and Yousof (2021a and 2021b), Elgohari and Yousof (2021), Elgohari et al. (2021) and 
Shehata et al. (2022). 


6.1 BvCGCW-G type via Clayton copula 


Let X, ~ PTLG — G(@') and X, ~ PTLG — G(@?). Depending on the continuous marginals U/ = 1-Uand 
Y =1-Y, the Clayton copula can be considered as, 


CAAY =P) eV *=1) 0} 4), 


where, 
& €[-l1, ©) — {0}, WE (0,1) and VE (0,1) 
Let 
U=1-Fo (lo. V=1—-Fe, (2) lo, 
and 


1 
{1 = exp-ay {G yx)" [2 — Gy} )}. 


J-— eu 


Then, the BvCGCW-G type distribution can be obtained from C4( u,v ). 


Fol = 


1 


6.2 BvCGCW-G type via REC 


The REC can be derived using the continuous marginal functions U/ = | — US Fo (x) € (0,1) and 
V=1-V= Fz) € (0,1) as follows: 


FX), X2) = Co, 1), Fp, (2) = x.U Fx, V = x1 Xp. 
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6.3 BvCGCW-G type via FGMC 
Considering the FGMC, the joint CDF can be written as, 
C,U,V)=U,V+tU,VEU,Y, 


where, the continuous marginal function U/ € (0,1), V € (0,1) and & € [-1,1]. 
Setting 


u= Up and Y= Vo, 


16150 o>. 


we then have, 

F(x,,X>) =UV1+ & U,V). 
Then, the joint PDF can be expressed as, 

cg (U,V) =14+ BUY, 

where, 

U* = 1—-2Uand Y* = 1-27, 
or 

F601 Xo) = Seo, (1) fo, 0) CF'o, 1), Fo, 2), 


where the two function fy(x, x7) and cg(U,V) are PDFs corresponding to the joint CDFs and f,(x, x2) and 
c4(U,V). 


6.4 BvCGCW-G type via modified F€MC 


The modified formula of the modified FGMC can be expressed as Cy (U,V) = & O(U)* JV)* + UY, with 
UOWU)* = JV) and AV)* = VI(V) where OW) € (0,1) and AV) € (0,1) are two continuous functions and 
OU =0)= OU= 1) = JAV=0)= AV= 1)=0. The, the following four types can be derived and considered: 


6.4.1 Typel 
The new bivariate version via modified FGMC type I can be written as 


Ce U,V) = OWU)* HV)* + UY. 


6.4.2 Type IT 
Consider A(U;&,) and B(V;4,) which satisfy the above conditions where, 


AU )|e1s0) = U4 (1- W)*. 
and 


BV; p)| e250) = V2 (I - Yr. 
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Then, the corresponding bivariate version (modified FGMC Type II) can be derived as 


C40,61,62 UV) =UV + &) WAYS |) BY,4>). 


6.4.3 Type IIT 


—~~- — 


Let A) =U [log(1 +] q=1-2 and BO) = V [log(1 +V)] 149, Then, the associated CDF of the BvCGCW- 
G-FGM (modified FGMC type IID) is 


Cp UY) =U +W 6AW BO). 


6.4.4 Type IV 


Using the quantile concept, the CDF of the BvCGCW-G-FGM (modified FGMC type IV) model can be 
obtained as, 


CU,V) = UF" (U) + VFO (V)— Ft (YW) F" (V) 
where, F-'(U) = O(U) and F'(V) = O(Y). 


6.5 BvCGCW-G type via AMHC 


Under the “stronger Lipschitz condition”, the joint CDF of the Archimedean Ali-Mikhail-Haq copula can be 
written as, 


1 
CHO.) =F ap Vi Gey 


the corresponding joint PDF of the Archimedean Ali-Mikhail-Haq copula can be expressed as, 


fig voV 
C4(0,V) ~ [1 — boV/? l-4 + eae leect,1 
and then for any D = 1 =e (x1) = lio =(I-v) €(0,1)] and p= 1 —Feo (X)lrp= (1-v)E(0,1)] we have, 
| 2 
Cx : F<) F 
‘p(X 5X2) 1-41 — Fo, («ID — Fg,@)] Fo, @) Fo.) ]leeci.n, 
and 
C4(x1%2) 
X1X5) = 5 
ed "1 61 — Fg DIL — Fen? 
Fo, @) F'p.(%) 
x|/ 11-6426 leet) 
1-61 — Fg pl — Fg, 


7. Conclusions 


The present paper studied a new three-parameter compound family of probability distributions called the 
complementary geometric Weibull-G family. The relevant mathematical properties such as the ordinary 
moments, probability weighted moments and order statistics are derived and analyzed. The probability density 
function of the complementary geometric Weibull-G family is expressed as a mixture of the exponentiated-G 
densities. We presented certain characterizations of the new family based on: (7) conditional expectation 
(truncated moment) of certain functions of a random variable and (ii) reverse hazard function. For facilitating 
the mathematical modeling of the bivariate real data, we derive some new bivariate type extensions using 
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Farlie-Gumbel-Morgenstern, modified Farlie-Gumbel-Morgenstern, Clayton, “Renyi and “Ali-Mikhail-Haq 
copulas. 

As future potential works, we can apply many new useful goodness-of-fit tests for right censoring 
distributional validity such as the Nikulin-Rao-Robson goodness-of-fit test, modified Nikulin-Rao-Robson 
goodness-of-fit test, Bagdonavicius-Nikulin goodness-of-fit test, modified Bagdonavicius-Nikulin goodness- 
of-fit test, to the new family as performed by Ibrahim et al. (2019), Goual et al. (2019, 2020), Mansour et al. 
(2020a-f), Yadav et al. (2020), Goual and Yousof (2020), Ibrahim et al (2021) and Yousof et al. (2021), Aidi 
et al. (2021) and Yousof et al. (2021a), among others. 

Some new acceptance sampling plans based on the complementary geometric Weibull-G family or based 
on some special members can be presented in separate articles (refer to Ahmed and Yousof (2022) and 
Ahmed et al. (2022)). 

Some useful reliability studies based on multicomponent stress-strength and the remaining stress-strength 
concepts can be presented (Rasekhi et al. (2020) and Saber et al. (2022a,b), Saber and Yousof (2022)). 
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Appendix A 


Theorem 1. Let (Q, FP) be a given probability space and let H= [a,b] be an interval for some d < b(a=—o, b=00 
might as well be allowed). Let XY - Q— H be a continuous random variable with the distribution function F’ 
and let Y,(x) and Y,(x) be two real functions defined on H such that, 


ELY (x) |X = x] = ELY (x) |X= x] E(x), x € H, 


is defined with some real function €. Assume that Y,(x), Y(x) € C'(A), &(x) € C’(A) and F is twice 
continuously differentiable and strictly monotone function on the set H. Finally, assume that the equation 
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€Y, = Y, has no real solution in the interior of H. Then F is uniquely determined by the functions Y (x), Y,(x) 
and ¢(x), particularly, 


may ee &(u) 7 
F(x)= I a haw than = ale s(u))du, 


where the function s(z) is a solution of the differential equation, 
o£ MN) 
(x) VY, (x)- Y,(x) 


and C is the normalization constant, such that ie dF=\. 
Note: The goal is to have the function €(x) as simple as possible. 


Chapter 5 
New Odd Log-Logistic Family of 
Distributions 


Properties, Regression Models and Applications 
Emrah Altun,'* Morad Alizadeh,* Gamze Ozel’ and Haitham M Yousof* 


1. Introduction 


Statistical distributions are used to model and make predictions about the data in many applied sciences. 
However, known distributions, such as normal, gamma, and Weibull, are insufficient to provide conclusions 
about complex datasets. Therefore, many families of distributions have been proposed, especially in the last 
decade. Many complex data sets can be modeled with high accuracy thanks to these families of distributions. 
One of the most important generalizations of the log-logistic distribution was introduced by Gleaton and 
Lynch (2006) and described as the odd log-logistic (OLL) family of distributions. Several generalizations of 
the OLL family were studied by many authors such as Cordeiro et al. (2017), Alizadeh et al. (2017), Korkmaz 
et al. (2018), Alizadeh et al. (2018a), Alizadeh et al. (2018b), Korkmaz et al. (2019), Alizadeh et al. (2021), 
Rasekhi et al. (2021). Researchers continue to be interested in the generalizations of the known distributions. 
For instance, Kilai et al. (2022) studied a new generalization of the gull alpha power family which was 
originally introduced by Ijaz et al. (2020). Another generalization of the Gumbel-Weibull distribution was 
introduced by Osatohanmwen et al. (2022). An interesting work was done by Omair et al. (2022). The 
Whittaker function was used to define a new distribution. 
The cumulative distribution function (cdf) of the OLL family is, 

I eae) = G(%6) re Uy) 

G(x) + G(x;é) 


where, a is the shape parameter. The probability density function (pdf) of the cdf in (1) is, 
_ ag (x%56)G(xé)" G(x)" 
= — a 

[ G(x)" +G(ne)" | 


(2) 


f (30,6) 
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This paper proposes a new odd log-logistic family of distributions (NOLL-G) using the idea of Alzaatreh 
et al. (2013). The cdf of the NOLL-G is derived by, 


G(x;é)* 

G(x)" l G(x)’ 
F(x;a,8,o)= We a . ) 
(x:0,B,€) J (ay G(ué)'+G(mey (3) 


where, a > 0 and f > 0 are two shape parameters, is the vector of parameters for parent distributions such as 
G(-) and G(x; €) = 1 — GQ; ¢). The pdf and hazard rate of NOLL-G are given by, 


g(x€)G(xé)"" G(x)" [a + (B-a) G(x3€) | 


2 , (4) 
[O(n )" +G(x)" | 


f (x:@, 8.)= 


and 
a-l 
g(x;€)G(x;€) [a +(B -a)G(x;é)| 
—— a = B 2 
G(x) G (x8) +G(x;€) | 
This family is denoted by X ~ NOLL-G (4,6,¢). The NOLL-G contains the OLL-G as its sub-model. 
When the parameters a = f in the pdf of the NOLL-G, we have the OLL-G. When the parameters a = 6 =1, 


we have the parent distribution G(x; ¢). 
The below algorithm is given to generate random variables from the NOLL-G model. 


h(x, 8,6) = (4) 


Algorithm 


I. Generate U ~ U(0,1) 
II. Solve the non-linear equation:U = G(x; é)' /[G(se)" + G(xé)" | below 
Ill. Repeat steps 1 and 2, N times. 


The main motivation of the study is to provide a more flexible G-class family of distributions for 
modeling the different types of data sets such as increasing, decreasing, upside-down as well as bathtub 
hazard rates. Also, the proposed family, NOLL-G is capable of the modeling of the left, right and symmetric 
and bimodal data sets. The NOLL-G family can be viewed as a generalization of the OLL-G distribution. 
Additionally, the NOLL-G family is a wider generalization of the parent distribution. More importantly, 
thanks to the flexibility of the NOLL-G family, we define a regression model for the censored dependent 
variable. In the next section of the study, special members of the NOLL family are introduced. 


2. Special NOLL models 
2.1 The new odd log-logistic weibull (NOLLW) model 


The cdf of the Weibull distribution is G(x,é) = 1 — exp[-(x/b)"] where where € = (b,a)’, a > 0, is the shape 
parameter and 5 > 0 is the scale parameter. Inserting the cdf of the Weibull distribution in (4), we have the 
pdf of the NOLLW distribution. For the sake of simplicity, the pdf of the NOLLW distribution is omitted. 
The densities and hrf shapes of the NOLLW distribution are displayed in Figures 1 and 2. The NOLLW has 
right skewed and bimodal densities. Also, it has flexible hrf shapes such as increasing, decreasing, bathtub, 
constant and upside-down. 
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Figure 1: The pdf plots of NOLLW model. 
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Figure 2: The hrf plots of NOLLW model. 


2.2 The new odd log-logistic normal (NOLLN) model 


Inserting the pdf and cdf of the normal distribution in (4), we have the following pdf for the NOLLN 
distribution: 


F(x) = 2M Be)" [ar B-a) (29) r 


a = B 2 
| (2) +6(z)" | 
where, x € R and z = The mean, yw € R is a location parameter and o > 0 is a scale parameter, ¢(-) and 


@(-) are the pdf and cdf of the standard normal distribution and ®(z) = 1 — O(z). Figure 3 displays the density 
shapes of the NOLLN distribution. 
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Figure 3: The pdf plots of NOLLN model. 


2.3 The new odd log-logistic gamma (NOLLGa) model 


The gamma distribution is another famous statistical distribution to model right skewed datasets. The cdf 


i 


T(a) 
incomplete gamma function. The shape and scale parameters are a > 0 and b > 0, respectively. Inserting the 
cdf and pdf of the gamma distribution in (4), we have the pdf for the NOLLGa distribution which is omitted 
here for the sake of simplicity. Some density and hrf shapes of the NOLLGa are displayed in Figure 4. The 
NOLLGa distribution has right skewed and bimodal densities as well as increasing and upside-down hrf 
shapes. The new generalization of the gamma distribution opens new opportunities to model bimodal right 
skewed data sets. 


of the gamma distribution is G, ,(x) = 1 — , where /(a) is the gamma function and r(a :) is the 


3. General properties 


Several statistical properties of the NOLL-G model are obtained in the rest of this section. 


3.1 Useful expansions 


Let G(x) be a parent distribution and the exponentiated-G (Exp-G) model is defined by the cdf and pdf 
H(x) = G(xy and h(x) = cg(x)G(x)*"', respectively. Using the exp-G model several properties of the NOLL-G 
model can be obtained. Initially, we provide an expansion for the cdf of the NOLL-G model using the power 
series for G(x)* (a > 0 real) as, 


G(x)" =)'a,G(x), (7) 


where, 


tt, =ay(a)= H(A)! } (8) 
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Figure 4: The pdf and hrf plots of NOLLGa model. 


For any real a > 0, consider the generalized binomial expansion 


k=0 


where, b, = a, + (-1)* (/), Using the inverse of power series, (10) can be written as follows: 


Flz)\= e,G(x)™ ; 


1 . ; 
where, cy = z and for k = 1, c,‘s are determined from the recurrence equation, 


0 1 k 
CQ. = Fer 


0 r=l 


Differentiating the equation (11), we have the pdf of X, 


£(x)= Deahaal), 


(9) 


(10) 


(11) 


(12) 


where, /,.,(x) = (k + a@)G(x)**! g(x) is the Exp-G density function with power parameter k + a. So, equation 
(12) shows that NOLL-G model can be expressed as a linear combination of the exp-G densities. Under this 


fact, we obtain several properties of the NOLL-G model. 


3.2 Moments 


Here, we derive the moments of the NOLL-G model emphasizing them on the special case NOLLW model. 


Using mixture representation of the NOLL-G, the ” raw moment of X is defined by, 


He E(X’)= Yak (Yn), 
k=0 


(13) 
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where, E(Y, ‘) = €J%. x" g(x)G(x)"' dx which can be calculated numerically using the quantile function (qf) 
O,(u) = G'(u) as follows: 


EY) = C5 Og(uy" ue! du. 


We have the mean of X for r = 1. 
For the special NOLLW, we have 


-> ee r[Z" hv VSI, 
k,w=0 Y 


k 1)" k =| 
where, a = 0,507) a Hal gee _ ( +a)( =i [ +a ) 
At "(w+1)! ”) w 


3.3 Incomplete moments and moment generation function 


The r” incomplete moment of is defined by, 


which can be rewritten as, 


x) =D), ria (Y) 
k=0 


where, /,,-(v) = SEO (wu du. The integral I ic (v) can be determined analytically for special models using 
the qf. For the NOLLW model, we have, 


Y 
y= > Htiny ae JV eK: 
k,w=0 Y At 


The moment generating function (mgf) of X, say M(t) = E(e*), is determined from (12) as follows: 


M()= GM, (0). 


io) 1 
where, M,(t)= ¢] eG (x)g(x)dx = | exp [ 12, (uw) Jue du is the generation of Y-. 
ey 0 


4. Estimation 


The maximum likelihood estimation (MLE) method is used. Let & = (a,f,¢)' be an unknown parameter 
vector. The log-likelihood function of the NOLL family is, 


=Yiloge(x,8)+(a -1)'10gG(x,,2)+ (6-1) loeG(x,,4) 


+Yioe[a +(B-a)G(3.€)]-2 og] G(x.8)" +G(x,,)’ | 


The given log-likelihood function is maximized to obtain the MLEs of the parameters of the NOLL 
family. This procedure is implemented with the optim function of the R software. The Hessian function is 
also used to obtain an observed information matrix to construct the confidence intervals of the parameters. 


5. Simulation 


Now, we discuss the efficiency of the MLE method for the NOLLN model. The simulation results are evaluated 
based on the estimated biases, mean square error (MSE), average length (AL) and coverage probability (CP). 
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Figure 5: Simulation results of NOLLN model. 


The simulation is repeated, V = 10,000 times. The selected parameter values are a = 0.5, 6 = 0.5, uw = 2 and 
o = |. The generated sample size is increased by 5 units started from n = 50 to n = 1,000. The simulation 
results are summarized in Figure 5. Our expectation is that when the sample size increases, the estimated 
biases, MSEs, ALs should be decreased. Also, the CP should be nearly, . The results verify the expectations, 
and it is concluded that the MLE is an appropriate method to obtain the parameters of the NOLL-G model. 


6. Regression model 


The aim of the regression models is to explain the variability of the dependent variable using some relevant 
covariates. In this section, we introduce a new regression model for the censored dependent variable. To 
do this, we benefit from the NOLLW density. First, we use the appropriate transformation on the NOLLW 
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Figure 6: The pdf plots of the LNOLLW model. 


density. Let XY follow the NOLLW density and consider the random variable Y = /og(X). Substituting a = 1/o 
and b = exp(u) in the density of Y (for y € 9), we have, 


rosea oo) bolo SA) 
(Lea ofall] ow 
feos) o-oo] 


where,  € % is the location parameter, o > 0 is the scale parameter, and a > 0, 6 > 0 are the shape parameters. 
From now on, we denote the density in (14) as Y~ LNOLLW (a,f,¢,). The plots of the LNOLLW density is 
displayed in Figure 6. It is concluded that the new density can be used to model symmetric, left-skewed and 
bimodal lifetime datasets. The survival function of (14) is given by, 


ear) 
a i Ga) 


Thanks to the flexible density of the LNOLLW, we propose a new regression model based on the 
following model: 


S(y)= (15) 


T ; 
y, =x, Pt+oz, i=1,2,...,n, 


where, y; is the dependent variable following the density in (14) and.x7= (1, x;1,...,X;p) are the vector covariates 
for the th individual. The vector / = (f;,..., 8,)’ represents the unknown regression parameters. The scale 
parameter is given by o > 0. We use the identity link function such as y;= x] 6. The LNOLLW regression 
model contains the log-Weibull (LW) regression model as its sub-model ( Lawless, 2011). 
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Let the response be defined as y,= min {/og(x;), log(c;)} where x, and c; are the lifetimes and censored 
times, respectively. We define two sets to assign the observations which are the lifetimes and censored times 
for the individuals involved in the study. These are F and C. Let t = (a, f, o, 67)’ be the parameter vector of 
the LNOLLW model. The log-likelihood function of the model is, 


= D4 (c+ DOD), 
ieF ieC 


where, /(t) = log[f/Q),)], (a) = log[SQv,)]. So, the total log-likelihood function is 


«()=rtog{ +) + >) (z,-4,)+(a@ 1) ¥ log {1-exp[—u, ]} 


ieF ieF 


+(a-1) Qilog(exp[-m)]) + Dilog| +(B a){l exp| u,}} | 
29 log! { exp| u,}}" (exp| ui)’ | 
+BY lo8(exp[-4))- Dos! { exp| u,}}° + (exp| u,))’ | 


where, u;= exp(z,), Z; = (v;— v/)/o and r is the number of uncensored observations. The model parameters are 
estimated using the MLE method. The log-likelihood function in (16) is maximized using the optim function. 

Two residuals are generally used to check the suitability of the regression model for the fitted data. 
The first residual is the Martingale residual of Fleming and Harrington (1994). The interpretation of the 
Martingale residual is problematic, so, the modified deviance residuals are preferred more (Therneau 
et al., 1990). Here, the modified deviance residuals are used to check the assumption on the residuals of the 
LNOLLW regression model. 


, (16) 


7. Applications 


In this section, several members of the NOLL-G family are compared with the existing models based on 
the real data modeling. The competitive models are listed in Table 1. Four applications of the NOLL-G 
model are presented in the rest of the section to show the importance and flexibility of the proposed models. 
The comparison of the models is done by using the following metrics and goodness-of-fit statistics: - log- 
likelihood function, Anderson-Darling (A*) and Cramer-von Mises (W*) test statistics. 


7.1 Glass fibers dataset 


The used data set is reported in the Smith and Naylor (1987) study. The data represents the strengths of 1.5 
cm glass fibers. Table 2 shows the estimated model parameters and goodness-of-fit statistics. The NOLLN 
model has the lowest values of A* and W* statistics as well as the lowest value , —0. So, it is clear that the 
NOLLN distribution is the best model for the data set. Figure 7 shows the fitted densities and cdfs of the 
models. Figure 7 provides great evidence that the NOLLN model gives acceptable results for the data set. 


Table 1: Competitive models and their abbreviations. 


Models Abbreviations References 
Odd Log-Logistic-G Family OLL-G Gleaton and Lynch (2006) 
Kumaraswamy-G Family KUM-G Cordeiro and de Castro (2011) 
Exponentiated Generalized-G Family EG-G Cordeiro et al. (2013) 
Odd Burr-G Family OBu-G Alizadeh et al. (2017) 
Generalized Odd Log-Logistic-G Family GOLL-G Cordeiro et al. (2017) 
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Table 2: The results of the fitted model for the glass fibers dataset. 


Model | a B u s Ae we af 

N | 1.506 0.321 1.928 0.35 17.911 
| 0 0.028 

OLL-N | 5.971 1.540 1.638 1.446 2.262 16.067 
| 10.841 0 2.941 

GOLLN | 1.559 0.012 2.295 0.074 1.139 0.205 14.626 
| 0.423 0.009 0.169 0.03 

NOLL-N | 0.609 6.570 2.022 0.46 0.482 0.082 11.900 
| 0.257 6.975 0.381 0.226 

KUMN | 0.027 36.099 3.566 0.12 0.963 0.172 14.191 
| 0.015 92.947 1.288 0.062 

EG-N 13.800 0.582 2.376 0.43 0.969 0.173 14.201 
25.448 0.439 0.65 0.251 

OBu-N | 242.460 3.409 1.868 83.768 0.754 0.135 13.183 
| 0.01 0.013 0.011 0.037 
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Figure 7: The graphical comparison of the fitted models for the glass fibers data set. 


7.2 Daily ozone measurements 


In the second application, the data comes from the daily ozone measurements in New York between the dates 
of May-September 1973. The data is fitted by NOLLW and other competitive models. The results are listed 
in Table 3. The NOLLW distribution has the lowest values for the goodness-of-fit statistics. Also, Figure 8 
shows that the NOLLW distribution gives better results than other competitive models. 


7.3 Failure times 


In the third application, the data is about the 73 failure times (in hours) of unscheduled maintenance actions 
for the USS Halfbeak number 4 main propulsion diesel engine over 25.518 operating hours. The estimated 
parameters and goodness-of-fit statistics are listed in Table 4. Again, the NOLLGa distribution has the lowest 
values for these statistics. Therefore, the NOLLGa distribution outperforms the other competitive models. 
The suitability of the NOLLGa model for the data is checked graphically which is given in Figure 9. 
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Table 3: The results of the fitted model for the daily ozone dataset. 


Model a. B a b A* w* —t 

WwW 1.340 46.074 0.966 0.17 542.610 
0.095 3.374 

OLL-W 1.308 1.067 47.653 0.822 0.139 542.284 
0.503 0.368 4.976 

GOLL-W 0.354 6.049 1.320 18.757 0.283 0.047 539.275 
0.157 2.796 0.319 5.031 

NOLL-W 1.395 0.233 1.714 24.744 0.14 0.019 537.933 
0.23 0.082 0.243 0.459 

KUM-W 3.643 0.225 0.958 6.870 0.357 0.048 541.019 
2.071 0.221 0.271 3.098 

E-W 0.284 2.587 0.834 4.688 0.54 0.085 541.202 
0.704 1.616 0.242 13.184 

Obu-W 1.797 0.148 1175 14.010 0.28 0.041 539.442 
0.426 0.08 0.126 4.134 
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Figure 8: The graphical comparison of the fitted models for the daily ozone data set. 


7.4 Heart transplant data 


In the last application, we show the usefulness of the LNOLLW regression model. We use the Stanford Heart 
transplant data set which has the information of 103 individuals. The dependent variable is the survival times 
of the individuals which is modeled by the following covariates. 


x, - year of acceptance to the program; 
X> - age; 
x3 - surgery (1 = yes, 0 = no); 
X4 - transplant (1 = yes, 0 = no). 
The same data set was modeled by Brito et al. (2017) using the log Topp-Leone odd log-logistic Weibull 


regression model. The model is shortly denoted as LTLOLLW. The data set is fitted by three models. These 
are LNOLLW, LTLOLLW and LW models. 
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Table 4: The results of the fitted model for the failure times data set. 


Models a B a | b Ae we t 
Ga 5.832 | 0.3 8.414 1.656 244.433 
0.952 | 0.051 
OLL-Ga 4.783 | | 0.02 7.263 1.401 236.534 
0.915 0.167 | 0.007 
GOLL-Ga 4576 0.27 2717 | 0.041 7.171 1.382 235.883 
1.484 0.272 2421 | 0.03 
NOLL-Ga 0.138 4567.849 11.260 | 0.143 0.43 0.07 194.090 
0.033 97.223 4268 | 0.122 
KUM-Ga 1.021 6.910 4699 | 0.118 7.374 1.435 236.211 
0.845 3.355 3.809 | 0.075 
EG-Ga 6.997 0.306 14.968 0.424 6.306 1.217 228.722 
3.704 0.053 0.002 0.002 
OBu-Ga 3.679 2.523 0.9019 | 0.023 6.263 1.199 228.442 
0.723 0.741 0.24 | 0.0095 
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Figure 9: The graphical comparison of the fitted models for the failure times data set. 


The results are given in Table 5. The best model is selected based on the Akaike Information Criteria 
(AIC) and Bayesian Information Criteria (BIC). The model having the lowest values of these statistics is the 
best model for the data set. As seen from Table 5, the proposed model has lower values of these statistics 
than those of the other two regression models. Therefore, the LNOLLW regression model produces more 
acceptable results then other models for the current data set. Additionally, the regression parameters /, and 
f, are statistically significant. 

The validity of the LNOLLW model is checked by the residual analysis. Figure 10 displays the quantile- 
quantile plot of the modified deviance residuals and its index plot. From Figure 10, we conclude that none 
of the observations can be evaluated as possible outliers. Therefore, the fitted model is appropriate for the 
current data. 
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Table 5: The results of the fitted regression models. 


Models 
LW LTLOLLW LNOLLW 
MLEs SE p-value MLEs SE p-value MLEs | SE p-value 
a ‘ : : 2.340 3.546 : 4.674 | 10.120 : 
, : : 24.029 3.015 : 3.815 | 17.608 é 
Oo 1.478 0.133 - 9.680 12.526 - 5.455 | 15.627 - 
Bo 1.639 6.835 0.811 —0.645 8.459 0.939 3.777 | 11.725 0.747 
Bi 0.104 0.096 0.279 0.074 0.097 0.448 0.214 | 0.096 0.026 
Bo —0.092 0.02 < 0.001 —0.053 0.020 0.009 —0.053 0.018 0.003 
Bs 1.126 0.658 0.087 1.676 0.597 0.005 0.174 0.497 0.726 
Ba 2.544 0.378 < 0.001 2.394 0.384 < 0.001 0.445 | 0.373 0.233 
-e 171.240 164.684 159.360 
AIC 354.481 345.368 334.721 
BIC 370.2894 366.446 355.799 
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Figure 10: The plots of the modified deviance residual. 


8. Conclusion and future work 


This paper introduces the NOLL family of distributions. The special models of importance belonging to the 
NOLL family are implemented to data sets to convince the readers about the applicability of the proposed 
models. The regression model of the NOLLW distribution is defined based on the location-scale family. In 
the future work of the presented study, we plan to develop a new generalization of the Pareto distribution 
using the NOLL family to analyze extreme events by using peaks over threshold methodology. 
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Chapter 6 


On the Family of Generalized 
Topp-Leone Aresin Distributions 
Vikas Kumar Sharma,'* Komal Shekhawat? and Sanjay Kumar Singh' 


1. Introduction 


The past decade was a prolific period in which most of the developments of the distributions took place. 
Various extended/modified families of the probability distributions were proposed for fitting lifetimes data, 
count data, and other random phenomenon from applied areas. Among many, we may mention some of 
the recent works here. Eliwa et al. (2021) introduced the exponentiated odd Chen-generated (G) family 
of distributions which can be served as a lifetime distribution for data modelling positively and negatively 
skewed data sets. El-Morshedy et al. (2021) also proposed an exponentiated type distribution which is 
suitable for fitting both symmeteric and asymmeteric data. Among others, Alzaatreh et al. (2013) presented 
a method for generating new classes of distributions. The method is described as follows. Let T be a RV of a 
generator distribution with PDF, r(¢) defined on [a, b] and X be a continuous baseline RV with CDF, Gx (z). 
The CDF of this family (called TX family) is given by, 


W(Gx(«)) 
Fyx(x) = / r(t)dt = U{W (Gx(2))}, (1) 


a 


where, W (Gx (x)) € [a, 6] is a differentiable and monotonically increasing function in x and W(t) is the 
CDF of the RV T’. This approach is widely used in statistical literature. This approach of unifying probability 
distributions was first utilized by Eugene et al. (2002) in which the authors introduced a four parameter beta- 
normal distribution while assuming T ~ Beta(a, 3) and X ~ Normal(p, 07) with W (Gx(x)) = Gx(z). 
After Alzaatreh et al. (2013), various distributions were proposed with different choices of the distributions 
of X and 7’. For instance, Alzaatreh et al. (2014) and Sharma et al. (2017) introduced the gamma-normal 
and Maxwell-Weibull distributions, respectively. 
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It is always good to have the simplest generator for developing flexible and parsimonious distributions. 
The TLD by Topp and Leone (1955) is considered to be a good choice for producing the extended distribu- 
tions. A recent article on the TLD by Shekhawat and Sharma (2021) can be followed for associated theories, 
parameter estimation and application. Sangsanit and Bodhisuwan (2016) investigated the use of the TLD as 
a generator in (1) while they considered the generalized exponential distribution as the baseline distribution. 
Sharma (2018) introduced the Topp-Leone normal distribution for fitting skewed data sets that has increasing 
failure rate function. 

Using the TLD, Chesneau et al. (2021) recently established a potential family of distributions that unifies 
the various known classes of the probability distributions. They defined the family by the following CDF, 


F(z) = [G(a;€)P [2 - G(a;)P?,8 <1,A>0,2E R, (2) 


where, € is the parameter(s) of the baseline CDF, G(a;€). It is called the ETL-G family of distributions. 
Importantly, we can note here that this family is a wrapper of the various known families as follows: 


¢ When ( = 0, it reduces to the family of the Lehmann-type distributions, AL-Hussaini and Ahsanullah 
(2015). 


¢ When { = 1, it reduces to the family of the Topp-Leone-G distributions, Sangsanit and Bodhisuwan 
(2016). 


¢ When ( = —1, it reduces to the exp-half-G family discussed by Bakouch (2020). 


¢ When 6 = —1 and \ = 1, it can be seen as a particular member of the Marshal-Olkin family, Marshall 
and Olkin (1997). 


Chesneau et al. (2021) illustrated this family with six well-known probability distributions and advocated the 
use of the ETL-Weibull distribution over the beta and Kumaraswamy type Weibull distributions for fitting 
the GAG concentration level in urine. 

Recently, there have been high relevance and applicability of trignometric distributions for modelling 
various real-life phenomena. The most significant approach is to use trignometric transformations for intro- 
ducing generalized families of distributions. As the TLD and ASD (see Feller (1967)) are simple and easily 
tractable, yet they hold impressive statistical properties and applications. During last few years, the TLD is 
employed by many authors to produce probability distributions Al-Babtain et al. (2020) developed Sin Topp- 
Leone-G family of distributions. Al-Shomrani et al. (2016); Brito et al. (2017); Yousof et al. (2017); Rezaei 
et al. (2017); Elgarhy et al. (2018); Hassan et al. (2019); Al-Babtain et al. (2020); Chipepa et al. (2020); 
Reyad et al. (2021); Moakofi et al. (2022); Adeyinka (2022); Oluyede et al. (2022) are among important 
literatures proposing the generalized family of distributions based on the TLD. 

In this chapter, we consider the ASD as the baseline distribution. The ASD first appeared in Feller (1967) 
as a model of the random walk process. Various researchers has done enormous work based on the ASD due 
to its phenomenal properties. It is effectively useful in modelling unit range and symmeteric distributions. 
Schmidt and Zhigljavsky (2009), Arnold and Groeneveld (1980) and Ahsanullah (2015) discussed its char- 
acterizations. The ASD is one of the members of the McDonald distribution and so it is well-linked with 
the beta and Kumaraswamy distributions as a special case, see Cordeiro and Lemonte (2014); Cordeiro et al. 
(2016). The CDF of the ASD is F(x) = 2 arcsin (\/x) ,x € (0,1), which is quite simple and tractable. 
Cordeiro and Lemonte (2014) proposed the McDonald-arcsine distribution using the idea of the TX family. 

The above discussion on the TLD and ASD motivates authors to combine both the distributions in the TX 
family given in (1) having the potential to produce a flexible distribution based on trigonometric functions. 
We further consider the ETL family with the ASD to introduced a flexible and parsimonious distribution for 
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fitting unit range data sets. The CDF of the proposed distribution is given by, 


r 
2 2 : 
F(x) = arcsin (Vz) 2 — —arcsin (va)| ! © € (0,1), A>0,8 <1. (3) 
T T 
We call this distribution as the ETLASD. The ETLASD has the following PDF of the form, 
f(x) =A[G(@) "12 - (B+ YG@))2-Ga)PP""g(z), xe € (0,1). 


= ————— ? arcsin (va)| ia 2 —(6+ 1) arcsin (va)| 2 - = aresin (2) 


mt /a(1— 2) 


dAB-1 


(4) 


Figure | illustrates various shapes of the PDF depending on the parameter values. The distribution has 
bathtub, increasing or decreasing, J and reversed J shaped frequency curves. Figure 2 shows the shapes of 
the HRF for various choices of the parameters. This Figure reveals that the ETLASD is capable of fitting 
increasing and bathtub data shaped HRF data sets. From these Figures, we conclude that the ETLASD may 
be found useful for fitting varieties of data sets. That justifies the investigation of this distribution over unit 
range distributions (see Section 9 on application). 

The remaining parts of the chapter are organized into the following sections: In Section 2, we derive 
the significant expansions of the ETLASD. Section 3 is dedicated to the moments in which we derive cen- 
tral, raw and incomplete moments, cumulants and characteristic functions. In Section 4, measures of the 
quantiles, skewness and peakedness are obtained. Entropy measures are discussed in Section 5. In Section 
6, the maximum likelihood estimation is utilized to estimate the unknown parameters. Stochastic ordering, 
stress strength reliability and identifiability are discussed in Sections 7 and 8, respectively. We provide an 
application of the ETLASD and compare it to its competing distributions in Section 9. Section 10 brings the 
chapter to a conclusion. 


“7 A= 0.5,B =0 


Density 
2 
| 


Figure 1: The PDF of the ETLASD(), £). 


2. Expansion 


In this section, we expand the ETLASD PDF and CDF in terms of the linear combination of the Lehmann type 
I distributions that are well studied in statistics literature, see Cordeiro et al. (2013). The expansion would 
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Figure 2: The CDF and HRF of the ETLASD(A, £). 


provide the tractable properties such as moments and related measures. The expansion of the ETLASD can 
be achieved using the arcsin function expansion. It is defined by 


arcsin(/x) = S- amet? al <1, (5) 


m=0 


where, a, = (2m)!/[(2m + 1)2?™(m!)?]. 
The power series raised to a positive integer r can be expanded as, 


(> ons" 7 yS Prmz; (6) 
m=0 m=0 


where the coefficients p,m (m = 1, 2,...) can be calculated using the recursive equation (with p,9 = a9), 


Prom = (mag)~* So(rk —m+ k)Q4Pym—k- (7) 


> 
Il 
un 


Further, we use exponentiated-G family to develop the linear combination of the beta distributions. The CDF 
and PDF of the exponentiated ASD are given by, 


2 ; d 2 os 
TalZ). = 2 aresin( Ve) and = hq(z) = ————_~ Paresin( V2) ,respectively. (8) 


n(x —x?)2 [7 


Expanding [2 — G(x)]*9-! = S7°8, (")2*9-*-1(-1)'G(z)' in equation (4), we have, 


Fle) = ra(ay > ("2a tay — g(a) (“2 1ayigtey*, 
1=0 1=0 


f(x) = fi(x)(say) — fo(a)(say). (9) 
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Expanding function f(a), we have, 


fi(z) = » (") oP g(Gay, (10) 
i=0 
fi(z) = wihyyi(2), (11) 
where w; = oj; (% )54g2P- * and 
d 2 Ati-1 
hysi(x) = EE ? aresin(y2)| . (12) 


Using the expansions (5) and (6), we obtain, 


A+i 9 A+i—1 co 
; _ a : (2m+A+i)/2—-1/4 _ ,,)1/2-1 
hy+i(2) — m(@ = g2)1/2 | Lo Phin (1 x) ; (13) 
and 
= 2n+A+i 1 
fie) = ay, s(i, Nan (« ae 5) (14) 


where, s(i,r) = w;2*+*-! aoe D\+i-1mB (224%, $) and gp(.) denotes the beta density function. Sim- 
ilarly, we have, 


fale) = Yo s'(ige (ws AES), (15) 


where, s (é,r) = w,2*+* ASEH py 1 mB (ABER 2) s(i,r) = wi2*t1 Att mB (224, 3) 


and w; = -729(—1)*(") wey 248-1, Therefore, from the equation (9), we have: 


s(inlan (= @A** 5) - $ (iynlan (2s AEE FY ie (0,1),A>0,8 <1. 
=0 


i,r=0 ir 


(16) 


The corresponding CDF is given by, 


WmM+AFi 1 _ Im+A+i+F1 1 

= 2 stanen (= 5 ont )- eG Gr (« . 3)ie (0,1),A>0,8<1. (7) 
r=0 1,r=0 

Since the PDF can be written as the linear combination of the beta distributions, the numerous statistical 

properties can be immediately derived similar to that of the beta distribution. 


3. Moments 


The moments are quantitative measures that are used for characterizing the distributions and to study their 
shapes. We measure the flatness of the distribution as well as distinctive degree of skewness. Here, we derive 
the expressions of the raw moments that are purposive to derive central moments, cumulants and other higher 
derivative moments. The kth central moment of the ETLASD is derived in the following Proposition. 
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Proposition 1 For a RV X that follows the ETLAS D(A, 3), the kth moment about the origin is given by, 


oe) B(m+k+A4#,4 is B(m+k+ 4 5 
BX) = 5. (xin ( = 3) s (i,r) fae). (18) 


z,r=0 


Proof 1 Using the expansion given in (16), the k*” moment is defined as, 


oo 1 
y +k Atti _ i_ 
Me = = De (sptttky | gmt he NL — 0)? 'da— 


1,r=0 2 


28 2D 
. 1 
# (ir) 5 | gtk+ 1 (q -9 ar) 
0 


Bi(m+ Att + 


oS . B(m+k+4#, 5) io. B(m+k+ A" 5) 
= (60 B (m+ 44,1) s (i, r) B (m+ 41 1) : 


The central moments(/1;,) and cumulants (/<;,) of X can be determined respectively from the equations given 
below: 


tk (rai 

n= >> (*) (-1)°uy*yy_, and Ky =py,—- >> (‘ - 1) Koti (19) 
s=0 s=l 

where, Ky = pry, Ko = plp— My, Ka = pg ~Bptgpy + 2u3, Ka = fy —Apig hy — 3pey? + 12141 — Gy! The 

measures of skewness (y, = K3/K) and kurtosis (y2 = K4/K%) can be calculated from the cumulants. 

We further derive the expressions of the expectations derived from the ETLASD RV. These are given by, 


= B(m+k+4#,5-k) B(m+k+ Att 1 _f) 
E(X*(1—X)-*) = 292 ; 2.18 
( ( ) ) > (6 B(m+ Ati i) s (i, r) B(m+ Atitl +) ’ 


ee 1 ay 1 
B(X) = Yo stir) (1 sotserei) Sin (1 mae) 


_< s(i,r)(2m+r+4) s (i,r)(2Qn+A+44+1) 
axa) = ¥ (ate a) 
= A+i41 Aba be 
END (r (m+At**) aia a (m+ 5 )) (s(i,r) = s (ir) - 
2 tie 
amt aqes OT 


where, F (a) = 4 log(7(a)) is digamma function, and 


. Wm+r+i re 2m +Ati+1 
B(X tog(X)) = (si,7) oe AE ahr) sy i 5) * 


. 3, Ati ee ee een 28' (i, r)(2m + A +4) 
ae ae pe Qm+A+i+2)2 © 
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3.1 Characteristic function 


The CF is a significant statistical property that completely defines the probability distribution. The CF is the 
Fourier transform of the PDF. The CF of the ETLASD is given by, 


il al 
(A, B;t) = B(e**) = | el” f, (ode — | elt fo(x) de, 
0 0 


ae M die mM+rA+i 1 fs: te mM+rAFi41 1 
= s(i, r) € IB zy ey —s (i,r) e IB xy eee dx, 
>» 0 2 2 P 2 2 


1,r=0 
= > slr) Fmt AE m4 ATE ity — 5! Gr (mt APE mt 14 AP in), 
z,r=0 


where, 1 F'(.,.;, ) is Kummer’s confluent hypergeometric function (of the first kind). It also follows that the 
moment generating function is given by, 


Mx(X,8;t)= > s(i,r)1Fi(m+ ~E* m4 AFF ty — sir t At m4 14 Atty, 
1,r=0 
= ~ : = (Qnm+A-i™ 4 De = (Qm+ Ate 1)” 
= ra >> (= par ae 1)™) n! Ss ond, (2m te Macy 2)() al ; 


5S QmtrX4+)™ 8) 1 (mtr tit OM er)) 2 
ap Dp? (in (GH Dw =) ° (on (Fe =)) ml 


2,r=0 n=0 


Using the moment generating function, the kth raw moment is given by, 


aix= > (sn (gerasraa) 99 (Geareieam)) 


1,r=0 


where ()" is a Pochhammer symbol representing rising factorial. The kth moment can also be expressed in 
terms of the Beta moments as given by, 


CO co 


EIX*) = 2m+A+itk—1 3S s(i,r)E(V*-2) 4 2m+A+i+k 3 s'(i,r)E(Z*-), 


Im+rAFi+kK M+ Im+ AFI KET 


where, Y ~ Beta (2m 5) and Z ~ Beta (see 5) : 


4. Quantiles, skewness and flatness 


In this section, we derive the QF of the ETLASD and using it we study the shape of the distribution. The QF 
is an alternative to the moments. For some distributions, the moments are not expressible in closed forms but 
the quantiles are obtainable by inverting the CDF. In such cases, statistical measures can be obtained using the 
QF. The pth quantile (say, Q,,) is derived by solving the equation F'(Q,) = p,i.e.,Q, = F~'(p),p € (0,1). 
For the ETLASD, the Q,, is obtained as, 


Xr 
{Ze (Qn) 2 = = aresin (var)] | = pip € (0,1). (20) 


On the Family of Generalized Topp-Leone Arcsin Distributions 101 


Taking ~ arcsin (, / Os) = t°, we can obtain the pth quantile of the ETLASD by solving the following non 
linear equation, 


t+) _ a4 4 pl/r8 — 0, (21) 


The QF for some given ( values are as follows: 


I. For 6 = 1, Q»p = sin? (5 (1- yi-p#)). 


Il. For 6 = 0, Q, = sin? (403). 


pul 
Ill. For 8 = —1, Qp = sin? ( Tp ). 


L 
1+p%x 
2 iedps —4/148p® 
IV. For 3 = —2, Q, = sin® (17(p)), where T(p) = ge 
pr 


V. For 8 = —3, Qy = sin? (r7(p)), where T(p) = 1 


Py eee 3 T\ 3° 
2.33 (93 +4/ 3p (1427p )) 


The median, Inter-quantile range, coefficients of skewness and kurtosis based on quantiles are given by, 


(Median) M = Qo 5, 


(Inter-quantile range) IQR = Qo.75 — Qo.25, 
Q.25 + Q.75 — 2M 


(Galton’s cofficient) S = Q Q with S € [-1, 1], 
75 — @.25 
(Moors’s cofficient) T = @.875 ~ @.625 + @.375 ~ @.125 1:23. 
Q.75 — Q.25 


The shapes of the distribution can be identified as, 


¢ If S = 0, the distribution is normal, i.e., symmetric and if S < 0(S > 0), the distribution is left (right) 
skewed. 


¢ If T = 0, the distribution is normal, i.e., mesokurtic and if T < 0(T > 0), the distribution is platykur- 
tic(leptokurtic). 


We sketch the curves for T, S, IQR and M with varying parameters in Figure 3. The ETLASD has the normal, 
mesokurtic and platykurtic shapes for its density depending upon its parameter values. It also provides 
symmeteric, positively skewed and negatively skewed shapes. The median increases with increasing values 
of the parameters. The IQR converges from highly skewed to a flat shape as parameter . increases. 


5. Shannon entropy measure 


A classical measure of uncertainity for a given RV_X is the differential entropy, known as Shannon entropy, 
which is defined by J, = E [— log[f(X)]]. For the ETLASD, the Shannon entropy is given by, 


I, = log(A) — log(z) — 5B flog(X(1— X))}) + (A-1)E ow (Zeresin(y)) | + 


E og (2 =(e+ 1)2aresin(v@) | +(A8—1)B og (2 = “ aresin(V2)) | , (22) 
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Figure 3: The skewness, kurtosis, median and IQR of the ETLASD(), 8). 


As a Statistical average, J, measures the expected uncertainty including the PDF and the predictability of 
the outcome X. It has numerous implementations in various fields such as data communication, physics, 
combinatorics and others. The elements appearing in (22) are separately derived, which are as follows: 


1 
E ow (2aresin(v)) | = af A(2— (8 + 1)t)t?-“Mog(t)(2 — £)?* "at (23) 
0 
op (8+ 1)3Fo(A+1,A41,1— BAA + 2,A4 2; 4) 
(A+ 1)? 
23F% ater) 
2 


E ow (2 = 2 aresin( 2) | = a BOS (B+ DOr Mos 2 =O tds. 94) 


= 2+) Bi (A +1, Bd), 


E(log(1 — X)) = > FS) (s(i.r) -s'(i,r)) = 3 s(i,r)F (m+ ao + (25) 
i,r=0 i,r=0 
Lorene (mei =) ; 
E(log(X(1 — X)) = 5 i (« ( ae _) +r rn Ast) +F(5)) (s(i,r) —s (ir) + 
| (26) 
2 Slane (m+ aad) deer) (r (m+i+ A=) mom) : 
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E tog (2 —(6+1) 2 aresin( V2) = | 2=(8 + DOO R=" ng = (8 4 wd (27) 
Bd = 1 B+1 
=5 (eee = ae z) ae) 


Combining the elements (24), (25), (26) and (27), the Shannon entropy is given by, 


Does tog AC 9)) oe" (@ — Ara)? (@ +1)? 3Fb (a+ 1X dt = BABA 5) = 


(B+1)Fi (A+ 1;-Bd, 1,44 2; 4, 4+) 
A+1 


By(A+1,2)) - ; (s (r (m+ it) rm 3 (m+ va +F(5)) (s(i,r) = s'(i,r)) + 
a s(t, r)F (m+ a) + sen (r (m t1+ A=") mom) : 


6. Maximum likelihood estimation 


+ 62°*4(8 — 1)x 


23h> (Aa1-prr+1a+i5)) + 


We use the MLE approach to estimate the unknown ETLASD parameters. This is attained by maximizing 
the likelihood function so that, under the assumed statistical model, the observed data is most probable. The 
logic of the MLE is both instinctive and flexible. Let x = 21, %2,...,%, be arandom sample of size n drawn 
from the ETLASD. The log-likelihood function, based on sample x, is given by, 


n 


iis =A nines => llog(2;(1 — ai) + A-1) 0 og (Zaresin( vm) 


w=1 


# s og (2 ~(B+ 1)-aresin(y%)) | + (AB - 1) ys og (2 = 2 aresin( Vi) . (28) 


i=l 


To obtain the MLEs (0, B), we maximize the log-likelihood function given in (28). The MLEs can be 
determined numerically solving the following log-likelihood equations, 


Ol(O|xz) — n ” 2sin” 2sin ~ (./2; 
et = $+ AS os (2 =) Yes Go 1) =o (29) 


AUC lx) _ > ae ae el -+ ie (2 _ oh =0. (30) 


From equation (29), ) is obtained by 


\= 


yr, log (2 aresin(,/a;)) + 8 077, log (2 — 2 arcsin(,/zi))’ 


where B can be uniquely determined by solving the following non-linear equation, 


Z 1 — arcsin(./2;) (nlog (2- 2 arcsin(,/xi)) — log (2 eentyen)) + nm log (2 arcsin(./Z;)) eo 
= ey arcsin(,/%;) log (2 2 2 arcsin(./a;)) / , 
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For interval estimation, we compute the expected information matrix which is given by, 


Ty | 
1(@) = 
(9) iB leg 


n sin-!(\/a; n sin~!(./a;)? 
where, hy = —32, De = jai log (2 — Font) and Igg = — >;_1 CSCrSUE Hee 
The Inverse Fisher information matrix is given by, 


23 1 I —I 
Ee ie halen Ey me 
It is well-studied that the asymptotic distribution of the MLE is normal, i.e., /n(O—O) ~ N(0,1(@)~*) and 
it can be used to construct the approximate confidence interval for the parameters \ and (3. The asymptotic 
100(1 — a)% confidence intervals for \ and 8 are \ + Z/2\/ [var(X)] and 6 + Zou/2 [var(3)], where Ze./2 
is (1 — a/2)th quantile of N(0, 1) and var(.) is the diagonal element of I(@)*. 


7. Stochastic ordering 


In the case of proportion data, comparison of two independent variables may be of interest. Therefore, we 
study the ordering of two ETLASD RVs. Let X and Y be two non-identical ETLASD RVs. Stochastic 
ordering is defined in terms of various statistical functions such as CDF, HRF, mean residual life function 
and likelihood ratio. The basic definition of the stochastic ordering is as follows. A RV X is said to be 
stochastically greater than Y if F'y (a) > Fy(ax)Va and it is denoted by X >,; Y. We discuss here the 
stochastic ordering using the likelihood ratio. 


Proposition 2 Suppose X ~ ETLASD (4, 81) andY ~ ETLASD(X2, 82). Then, we have the following 
cases of stochastic ordering in terms of the likelihood ratio. 


1. When 8, = Bp = B <1land 1 > 2, Y <p X. 
2. When \, = A2 = Xand By > 81, Y <1, X. 
Proof The likelihood ratio for the ETLASD is given by, 
ear a (geremv) Bip aytamaincray B= (Zatsanva)) Po 
We first take 3, = G2 = 6 and differentiate log-likelihood ratio with respect to x that gives, 
dog ( etd aresin(yz)__(2~ (8-+1) (Zaresin(ya))) 
dx fy (@; Az, 8) Jal — a) (2 aresin(/x)) (2 — (2 aresin(/Z))) 


We have t. log (2S) >0O Va. Therefore, (Y <;, X) when Ay > Ay and 6; = Bp = 6B <1. 


For A; = A2 = J differentiating the log-likelihood ratio with respect to x gives, 
d fx(a3A, 3) 1 

lo - = : 
da °° (FE d, Ba) ie Pu x(1 — x))(2— (2 aresin(\/x))) 


which is greater than 0 if 82 > 8). Thus, (Y <;, X) when 82 > @) and Ay = A2 = A> 0. 
The ETLASD also holds ordering in the HRF and mean residual life following the implications stated by 
Shaked and Shanthikumar (1994) as under: 


X Shr Y 
X >rp Y 


) = (Ar — Az) 


X2n¥—=| | eX Sa ¥ eX Se ¥ 
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8. Identifiability and stress strength reliability 


For any two RVs X and Y, the stress-strength reliability parameter is specified as R = P[X > Y] where RV 
X represents the strength and Y represents the stress. It measures the probability that the system has enough 
strength to overcome the stress. Johnson (1988) discussed the three stress strength models which are applied 
to a vast number of applications in civil, mechanical and aerospace engineering. 

If X ~ ETLASD(Ai, 61) and Y ~ ETLASD()a, G2), then we have, 


Ar (81 — A2(B1 — B2)2O01 14442 (6242) Beta [3, Ar + Av, 1+ Birr + Bodo} ) 


R= 
ByA1 + Bor2 


Identifiability is an important property which a model must hold for precise inferences. A model is said to 
be identifiable when it is theoretically possible to find the true value of the parameter when we obtain an 
infinite number of observations. Let F = {Fy;@ € O} be a statistical model where parameter(Q) is said to 
be identifiable if 9 —-+ Fy mapping is one to one, 1.e., 


Fo, = Fo, = 0,=0. V 01,02 € O. 


To prove the identification property, we use Theorem | of Basu and Ghosh (1980) which states that the 
density ratio f)(x;01)/fi(a;O2) of two different members of the family defined on the domain (a,b), 
converges to either 0 or co, when x — a. For the ETLASD, we have 


0 ifrs > A1,f2 > fi, 
=<oo ifre<rj1,ho <A, 
1 ifA2 = Aq, 82 = fi. 


es fi(a; A1, 81) 
«0 fo(x; 2, Ba) 


Therefore, the ETLASD is identifiable in parameters(A, 3). 


9. Empirical study 


In this section, an application to the real-life data is demonstrated to show the usefulness of the ETLASD 
over Beta, Kumaraswamy, GTL and UG distributions. All distributions are indexed by two-parameters and 
defined on a uni-interval. We consider a data set that studies the performance of an algorithm called SC16. 
The data set was first used by Caramanis et al. (1983). Altun and Hamedani (2018) also used this data set to 
demonstrate the application of the log-xgamma distribution. 

From Table 1, it is seen that the coefficient of skewness 3 > 0, which reveals that the distribution is 
positively skewed. As coefficient of kurtosis y < 3, the distribution is platykurtic. In order to aa the 
shape of the empirical hazard function, we use the TTT plot given by Aarset (1987). Let tp.;;2 = 1,2,... 
denotes the ith ordered sample. The scaled empirical TTT transform is obtained by, 


n(s/n) = poumen bring — n= Ang / do tn: 


The linear interploation of consecutive points (s/n, ®,(s/n)) (s = 0,1, 2,...,) are connected by straight 
lines to get the TTT plot. From Figure 4b, we can see that the SC16 data set accomodates the bathtub shaped 
hazard function as the TTT plot is initially convex then concave. Since the characteristics of the ETLASD 
match with that of the data, the distribution can be used for fitting this data set. We use the goodness.fit() 
command in R software along with the PSO method for the estimation of the parameters. 
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The PDFs of the beta, Kumaraswamy, UG and GTL distributions are given by, 


1 
feeta(ys a, B) = Beta(a, B) 


frumar(y; 0, 2) = of2*""(1-a%)P!; O0<y<1,a,8>0, 


vy’ *(—log(y))" 
I(r) , 


ferx(y; a, 8) al 2opx%F—1 (1 ~~ x) (2 ~~ goers O<a2<l1 a, B > 0, 


ea) 0<y< 10,8 >0, 


fualy3u,r) = 0<y<lvr>0, 


respectively. 

Table 2 lists the estimated parameters and the goodness of fit statistics values for all the distributions for 
the SC16 data set. In order to prove that the ETLASD provides better fitting of the unit range data than its 
competing distributions, we use graphical and statistical measures such as CM, AD, KS, AIC, CAIC, BIC 
and HQAIC. The AIC, BIC, CAIC and HQIC quantify the relative information lost when the model is fitted 
to real data set. The ETLASD has the smallest information criterion value among other distributions and so 
it can be a reasonable choice for fitting the U-shaped data having a bathtub shaped hazard rate. Figures 4a 
and 5 display the fitted density and fitted HRF plots for the distributions respectively. The CM, K-S and AD 
measure the discrepancy between the empirical and hypothetical distributions. This shows that the ETLASD 
gives a better fit of the data set over other distributions. 


Table 1: Descriptive statistics of SC16 data. 


Statistic SC16 data 
Minimum 0.006 
Maximum 0.866 


Mean 0.2881 
Median 0.1160 
SD 0.318 
Qi 0.0325 
Q3 0.518 
Skewness 0.767 
Kurtosis 1.974 


Table 2: MLEs of the parameters and goodness-of-fit statistics for the distributions based on the SC16 data. 


Model MLEs WwW AD K-S p-value AIC BIC CAIC HQIC 
ETLASD — (0.917, 0.925) 0.102 0.643 0.148 0.692 -15.373, -13.102 = -14.773 — - 14.802 
UG (2.505, 1.220) 8.271 46.118 0.998 0.000 -3.560 -1.289 -2.960 -2.988 
Beta (0.539,1.141) 0.114 0.712 0.161 0.586 -14.705  -12.434-14.105.—- 14.134 
Kumar (0.602, 1.215) 0.112 0.700 0.243 0.130 -14.576  -12.305 -13.976 ~— - 14.000 
GTL (0.700,1.024) 0.115 0.718 0.264 0.080 -9.999 -7.728 -9.399 -9.427 
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Figure 4: Fitted densities of the distributions and TTT plot for the SC16 data. 


10 
| 


o Empirical 
é — ETLASD 


Hazard rate 


Figure 5: The empirical and fitted HRF of the ETLASD for the SC16 data set. 


10. Conclusion 


In this article, we propose a new two-parameter distribution called “extended Topp-Leone arcsine distribu- 
tion” that accommodates four different major shapes for its density function such as J and revised J, increas- 
ing, decreasing and bathtub shaped. We discussed its fundamental properties that include shapes of the HRF 
and PDF, moments, quantile function, stochastic ordering, entropy measure, identifiability and stress-strength 
reliability. We also study the skewness and kurtosis of the distribution. It was observed that the ETLASD is 
positively and negatively skewed with varying degrees of kurtosis. We demonstrate empirically the impor- 
tance of the proposed distribution and its flexibility in fitting U-shaped density data having a bathtub shaped 
HRF. It was seen that the ETLASD is a better fitted model as compared to other unit distributions based on 
various information measures. Summing up, it can be stated here that the ETLASD distribution is a better 
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choice for fitting real-life unit range data that has a U-shaped density and bathtub shaped HRF. We hope that 
the distribution will be a stepping stone in the field of clinical studies, engineering, algorithms, economics 
and others. 
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Abbreviations 


PDF Probability distribution function 

CDF Cumulative distribution function 

TLD Topp-Leone distribution 

RV _ Random Variable 

ETL Extended Topp-Leone 

ASD Arcsin distribution 

ETLASD Extended Topp-Leone arcsin distribution 
HRF Hazard rate function 

CF Characteristic function 

QF Quantile function 

MLE Maximum likelihood estimation 

CM _ Cramér-von Misses 

AD Anderson Darling 

KS Kolmogorov-Smirnov 

AIC Akaike Information Criterion 

CAIC Consistent Akaike information criterion 
BIC Bayesian information criterion 

HQAIC Hannan-Quinn information criterion 
GTL Generalized Topp-Leone 

UG Unit gamma 

PSO Particle swarm optimization 


TTT Total time on test 
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Chapter 7 


The Truncated Modified Lindley Generated 
Family of Distributions 


Lishamol Tomy,'* Christophe Chesneav? and Jiju Gillariose’ 


1. Introduction 


Lifetime distributions are important for the understanding of lifetime phenomena in various fields of applied 
science. This is essentially due to the need for appropriate statistical models to analyze a variety of data. 
Because of its many applications in various areas, the exponential distribution is regarded as one of the most 
important one-parameter distributions. The modifications or generalizations of the exponential distribution 
have attracted much attention, especially since the Lindley (L) distribution was invented by Lindley (1958). 
The L and similar models have been used in a variety of applications to solve problems from several fields, 
including quality management, environmental studies, health sciences, ecology, marketing, finance and 
insurance. Consequently, the L distribution is used as an alternative to the exponential distribution in many 
statistical settings. The following one-parameter cumulative density function (cdf) governs it: 


where 4 > 0, and F;, (x; A) = 0 for x < 0. The probability density function (pdf) is then calculated as follows: 
Fy (x3 A) = Fy (x; A) . That is, 


x 


S34) = 


2 
A e**, x>0, 
1+/2 
and f, (x; 4) = 0 for x <0. 
The modified L (ML) model has recently been presented by Chesneau et al. (2019) as a middle ground 


between the conventional exponential and the L distribution. It has a cdf that is specified by, 


AX : . 
F,, (xjA) =1-| 1+ e*le*, x>0, 1 
(asaya 1-] 149 | () 
with 2 > 0 and Fy, (x; 4) = 0 for x < 0, and the related pdf follows upon differentiation: 


fg 04) = [a +A)e*+2Ax-1]er*, x>0, 
+ 
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and fy, (x; A) = 0 for x < 0. As the main reference, Chesneau et al. (2019) discussed the applicability of the 
ML model and demonstrated its usability using a variety of real-world data sets. The fact that its pdf may 
be represented as a linear combination of exponential and gamma pdfs is an important structural property of 
the ML distribution. Chesneau et al. (2020a), Chesneau et al. (2020b) and Chesneau et al. (2020c) proposed 
the inverted ML distribution, two expansions of the ML distribution, and the wrapped ML distribution, 
respectively, more recently. This paper explores two new facets of the ML distribution. First, we apply the 
simple truncation method to the ML distribution to offer a novel distribution with a range of (0,1), called the 
truncated-(0,1) ML (TML) distribution. Such kinds of distributions are ideal for the modeling of probabilities 
or percentages that occur in many applied areas. We provide the minimum theory and practice regarding the 
TML distribution. Second, based on the TML distribution, we create a novel generated family of distributions, 
called the TML generated (TML-G) family. We study its general theoretical properties. Then, based on some 
specific distribution, we show its effectiveness in data fitting, with the consideration of important data sets. 

The following is a summary of the rest of the paper: We introduce and investigate anew TML distribution 
in Section 2. In Section 3, we offer a novel TML model-generated family of distributions as well as a particular 
example of the derived family, in addition to its probabilistic features. Section 4 concludes with some final 
thoughts. 


2. The TML distribution 


2.1 Presentation 


The TML distribution is defined by the simple truncated version of the ML distribution on the interval (0,1). 
That is, based on Equation (1), it is governed by the cdf given as Fry (X74) = Fy, OAV F yy, 134), x € (0,1), 
Fry, (x74) = 0 for x < 0, and F’pyy, (x74) = 1 for x = 1. Thus, for x € (0,1), we explicitly have, 


Ax AX AX 
Fuses, {1-[1+ ee Je a \ (2) 


where, c, =(1 + A)/{1+4-[1 +10 + e*)Je*}. 
Thus, the corresponding pdf is defined by, 


Srv, 00:A) = d, [1 + De® + dx - 1] e?*, x € (0,1), 


where, d,=A/{1 +A-—[1 +A(1 + e*)Je*} and fry, (x74) = 0 for x €(0,1). With the truncated construction, the 
analytical properties of the pdf of the ML distribution are transposed to the unit interval (0,1). Thus, we can 
directly say that fp), (x;4) is unimodal. Here, the pdf of TML model is plotted in Figure | to see how its shape 
changes with variations in the values of the parameters. 


Nn al ——~ 1=6.06 N rr -2.=0.005 
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Figure 1: Examples of graphs of the pdf of the TML distribution. 
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Figure 2: Examples of graphs of the pdf of the TML distribution. 


Based on the two functions above, the hazard rate function (hrf) follows immediately: 


Fru 34) 
1— Fry (X34) 


_ d, [d+ Aje* +2Ax -1] ene 
l-c, {1 —[1+Axe** /Q +A)|e*} ? 


Dry (052) = 


x € (0,1), 


and hryy, (x2) = 0 for x €(0,1). Figure 2 shows some possible shapes of the above Equation. We can see that 
the hrf is increasing and increasing-decreasing. 


2.2 Some theory 
The following power series expansion holds for fpyy, (x;A) . 


Proposition 2.1 The pdf of the TML distribution has the following power series expansion: 


Fru, 0:4) = pea uj, xk + pris pe aaa 


where, 


_, (-A) k og heey 
“.=d, i. [d+A4)-2"], v, =-d, a (3) 


Proof. We have, 


Fruz A) = d, [A + Ae* + 2Axe* — oe], 

(=) + (2A) ist 0 (2A) 4 

ae aay” k} “ Dies k! . Dis k! , 
+» (-A)‘ back +» (-2A)" k+l 
4.) “ (1+ A)—2"]x ~ “ x 


+00 +00 

k kel 

= ) u,x + ) Vix. 
k=0 k=0 


The proof of Proposition 2.1 ends. CJ 
This series expansion will be useful in determining some important properties of the TML model and 
the proposed family. 
Let us now introduce a random variable X with the TML distribution. The (raw) moments of are XY 
examined in the next result. 
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Proposition 2.2 Let us consider the incomplete gamma function given by y(s,x) = a t*! edt, x > 0. Then the 
r" (raw) moment of X is given by, 


u, =E(X") = B(nesnays [v(r+2,24)— rer+i2a)} 


1 
2714+) 


Proof. From the definition of fp), (x;4) and appropriate change of variables, we get 


M= Io x’ Sr, OvA)a 


f | x OAY xe" ae= } Ba aye 


0 0 


1 
=d, | x" he dx + 
0 2(1+ 2) 


= 4) yr41,a)+— 
ry 2142) 


[y(r+2,2A)-vr +], 2a) 


Hence the desired result. 
Based on Proposition 2.2, the mean of X is obtained as, 


d, 1 _ 
= 4 7(2,A)+ qa’ 2A) 70.29)) 


1 
4+) 


d 
= a1 (1+ A)je* — Ge "+222" 426" -»} 
The variance, standard deviation, moment-skewness and moment-kurtosis can be expressed in a similar 
manner. 
The incomplete moments of X can be expressed in a similar manner; the above integral terms just need 
to be truncated at ¢ € (0,1). The final result is clarified in the following proposition. 


Proposition 2.3 Let us consider the indicator function over an event B denoted by |p. Then, for t € (0,1), the 
r” incomplete moment of X at t is given by, 


M(t) = E(X" Lyn) 


-2lyrstans [y(r+2,2At)-y(r +], 240) 


2714+) 


The proof is similar to the one of Proposition 2.2. It is thus omitted. 
Based on Proposition 2.2, the incomplete mean of X at ¢ is obtained as, 


= 4 (ya, At) +—— 173,241) - 702, 240) 


1 
A(1+A) 
1 
A(1+A) 


= af —(1+ athe” — [417767 4. 2Ater™ 4 — u} 


It is of particular importance, since it naturally appears in useful mean residual life functions and income 
curves. 


2.3 Application 


Here we turn out the TML distribution as a statistical model, assuming that is unknown. 
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2.3.1 Estimation of the parameter 4 


We estimate the unique parameter / by using the most popular method: the maximum likelihood method. 
Thus, the obtained estimate, say /, satisfies: 


A= argmax, L(x},...,%,3 A) 


where, x),..., x, denote all realisations of independent random variables Xj,..., X,, with the common TML 
distribution, and L(x,..., x,; 4) is the likelihood function defined by, 


LX 15005 3 A) = Diet fr A) 
= dj e* = "TT, (1 + Ae + 2dx;- 1]. 


Equivalently, we can also use the logarithm of the likelihood function to have the following relationship: 
2 = argmax, logL(x,,..., X,; 4). The obtained estimates are known to have desirable probabilistic properties, 
which enable the determination of confidence intervals and various statistical tests on A. The standard errors 
(SEs) and other basic properties of the estimates are also available, and provided numerically in all the 
statistical software, such as R or SAS. 


2.3.2 Data Fitting 


In this subsection, we show situations where the TML model is applicable. We consider a real dataset to 
ascertain the advantage of TML over the power distribution having the following cdf: 


G(x;a) =x%, x €(0,1), 


and for G(x,;a) =0 for x € (0,1), where a > 0 is a shape parameter. For a= 1, it is a particular case of the uniform 
distribution bounded on the interval . Parameters are estimated using maximum likelihood estimation. The 
source of the data set is Klein and Moeschberger(2006). It gives the time it takes for kidney dialysis patients 
to become infected in months. For previous studies on this data set, see, Bantan et al. (2020) and Chesneau, 
et al. (2021). In this case, we perform a normalization operation by dividing the dataset by 30, to get data 
between 0 and 1. The transformed data lies between 0.08333333 to 0.91666667. 

To determine whether the model is expedient, we derive the unknown parameters by the standard 
maximum likelihood method and then SE, estimated —log-likelihood (—logL), the values of the AIC (Akaike 
Information Criterion), BIC (Bayesian Information Criterion), Kolmogorov-Smirnov (K-S) statistic and the 
respective p-value and compare them. 

Table | provides the fingings of the descriptive summary for the fitted TML and power models for the 
dataset. From the findings evaluated, the smallest -logL, AIC, BIC, K-S statistic and the highest p-values are 
obtained for the TML model. Moreover, Figures 3a and 3b present the estimated pdfs and estimated cdfs for 
the dataset. Therefore, it can be concluded that for the considered data set, the TML distribution is a better 
empirical model for fitting data than the power model. 


Table 1: Comparison criterion for data set. 


Distribution Estimates(SE) —logL AIC BIC K-S p-value 
TML 2.0165(0.5564) 3.1959 4.3917 3.0595 0.1344 0.6930 
power 0.8351(0.1578) —0.4829 1.034179 2.366383 0.24084 0.07769 
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Figure 3: Fitted pdf plots for data set. 


3. The TML-G family 
3.1 Presentation 


The TML distribution is the springboard for more in distribution theory, starting with the construction of the 
TML-G family of distributions. According to the cdf of a continuous distribution, say G(x;¢) where ¢ denotes 
a certain vector of parameters, and Equation (2) the TML-G family is defined with the given cdf: 


Fru-co34,0) =¢, f [ + AGG) eal aa xeER. 


142 


Thus, by considering the pdf corresponding to G(x;¢), represented by g(x;d), the pdf of the TML-G family is 
given by, 


Frag 04,9) = a; g(x;¢) x 
[1 + Ae + 2AG(x;E -— 1] eS, x ER. 
Based on the two functions above, the hazard rate function (hrf) follows immediately: 


fro 03.4,9) 
1 Fryg_¢ 5459) 
d,e(x,6)[ + Ae + 24G(x,¢)-1]e Oe 


hyo (43456) = 


xeER. 


The following series expansion holds for fpyy_g (x74,0). 
Proposition 3.1 The pdf of the TML-G distribution can be expressed via pdfs of the E-G family as, 
Frag C640 = Like Uefe-g Ck + 1,0) + Dio Vi fgg Oo K+ 2, 0, 


where, fig Qk + 1, Q = kg(x;QG(x;0) which is the pdf of the E-G family with power parameter k + 1, 
u, =uj/(k + 1) and v; = v,/(k + 2), with u, and v, defined in (3). 
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This result is an immediate consequence of Proposition 2.1. It is thus omitted. 
A possible functional approximation of fpyy7_g (4,0) Is 


Frog CA. = Dio”? upto Ck + 10 + iG vi seg Or kt 2, 0. 


Moment-type measures can be derived from Proposition 3.1. As a example, for a random variable X with 
a distribution into the TML-G family, the r” moment of X is given by, 


1, = BOX) = Diza ujaee + Lika vi we E+ W), 


where, u;(k) = Lo tie (x;k + 1,¢)dx denotes the 7” moment of a random variable with distribution into the 
E-G family with power parameter 4 + 1, and can be approximated as, 


Hp LAG wise; (+ LES vi we E+ D. 


The interest of this approximation is mainly computational: The moments j(k) are available in the 
literature for a lot of parent distributions, and can be used to approximate wz, which can be very complex in its 
former integral definition. Based on the series expansion technique, other moment-measures can be provided, 
such as incomplete moments, and diverse entropy. 


3.2 TML-Exponential (TML-E) distribution 


In this subsection, we explore one member of the TML-G family, namely, TML-exponential (TML-E) 
distribution and present its properties in detail. The exponential distribution was chosen because it is the 
simplest and widely used model. In practice, other distributions can be used to model real data. 

Based on a cdf of a TML-G family, and TML-E distribution is defined with the following cdf: 


x 
A ( -e? ) : : 
enue 8) e-e 9) 


1+/A 


Foyy_p(X34,0) = ¢,51-| 1+ , x>0, 


A>0,0>0 and Fry _p (x:1,0) = 0 for x < 0. Thus, by considering the pdf corresponding to the considered 
exponential distribution, the pdf of the TML-G family is defined by, 


-x/@ 


e 
True 054, 8) = 


: [a + Ae) 5 2406 *!8) —1emre”, x>0, 


and frag_p (x;4,0) = 0 for x < 0. According to the two functions above, the hrf follows immediately: 


Srum-2 054, 9) 
1— Fray (Xs 4,8) 


hyg ¢ (X54, 0) = 


—x/8 


(d, (Oxe*)| (I+ Ae ) +2AQl—e"#) 1 mere) 


, x>0, 
l=¢, {1-[1+ ade errr dg ay fener} 


and hryg_¢ 54,9) = 0 for x < 0. 


3.3 Estimation and applications 


By considering the TML-E model, we estimate the corresponding parameters collected into a vector, say 
© = (4,8), by using the maximum likelihood method. 
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Our estimated vector, say @, satisfies 


© = argmaxe L(X),..., X,,3 ©) 


where x,,..., x, denote realisations of independent random variables X,,..., X,, with the TML-E distribution, 
and L(x,,..., x,,;Q) is the likelihood function defined by 


LX 150065 X39) = ia Sru_e 5 4, 9) 


___ Alternatively, the logarithm of the likelihood function can be used to obtain the following relationship: 
© = argmaxg logL(x,,...,x,,; O). The derived estimates are known to have good probabilistic features, allowing 
confidence ranges to be calculated and other statistical tests to be performed on © and its components. The 
SEs and other basic features of the estimates are likewise available in every statistical tool, such as R or SAS, 
and are supplied numerically. 


3.4 Data fitting 


In this section, we offer two real-world illustrations that demonstrate the TML-E model’s versatility and 
compare the TML-E distribution to the Lindley and exponential distributions. 


Data set 1: The first data set provides the time between failures for a repairable item, for more details see, 
Murthy et al. (2004). 


Data set 2: The second real data set suggests 30 consecutive values of March precipitation (in inches) in 
Minneapolis/St Paul given by Hinkley (1977). 


The findings of descriptive investigations for the fitted TML-E, Lindley, and exponential distributions 
for two data sets are presented in Tables 2 and 3. Since the smallest logL, AIC, BIC, K-S statistic and the 
highest p-values are obtained for the TML-E model, it can be considered as the best. The estimated pdfs for 
data sets | and 2 are presented in Figures 2a and 3a. In addition, the cdfs for each model are compared to the 
empirical distribution function in Figures 2b and 3b. Therefore, we conclude that the TML-E model provides 
good fits to these data. 


Table 2: Comparison criterion for data set 1. 


Distribution Estimates(SE) —logL AIC BIC K-S statistic p-value 
1 = 0.9759 (0.6881) 
TML-E 6 =0.8794 (0.1875) 39.8833 83.76661 86.569 0.0759 0.9952 
Lindley 6 = 0.9762 (0.1345) 41.5473 85.0947 86.4958 0.1407 0.5928 
exponential 6 = 0.6482 (0.1183) 43.0054 88.0108 89.4120 0.1845 0.2590 
Table 3: Comparison criterion for data set 2. 
Distribution Estimates(SE) —-logL AIC BIC K-S statistic p-value 
TML-E egeweh agate 38.5082 81.0164 83.81879 0.0582 0.9900 


4 = 0.7632 (0.1290) 
Lindley 4= 0.9096 (0.1247) 43.1437 88.2875 89.6886 0.18823 0.2383 
exponential 4=0.5970149 0.1090 45.4744 92.9488 94.3499 0.2352 0.0724 
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Figure 5: Fitted pdf plots for data set 2. 


4. Conclusions 


In this paper, we present a novel one-parameter model, the truncated-(0,1) ML (TML) distribution, and a 
new generated family, the TML-Generated (TML-G) family. We derive several structural properties for this 
new family, as well as investigate some properties of the new TML-Exponential (TML-E) model. Maximum 
likelihood is acclimated to estimate the model parameters. Two real-world examples show that the proposed 
TML-E distribution consistently outperforms other models in terms of fit. 


120 G Families of Probability Distributions: Theory and Practices 


References 


Bantan, R. A. R., Chesneau, C., Jamal, F., Elgarhy, M., Tahir, M. H. et al. (2020). Some new facts about the unit-Rayleigh 
distribution with applications, Mathematics, 8, 11, 1954, 1-23. 

Chesneau, C., Tomy, L. and Gillariose, J. (2019). Anew modified Lindley distribution with properties and applications, Journal 
of Statistics and Management Systems, DOI: 10.1080/09720510.2020.1824727. 

Chesneau, C., Tomy, L., Gillariose, J. and Jamal, F. (2020a). The inverted modified lindley distribution. Journal of Statistical 
Theory and Practice, 14: 1-17. 

Chesneau, C., Tomy, L. and Gillariose, J. (2020b). On a sum and difference of two Lindley distributions: theory and applications, 
REVSTAT- Statistical Journal, 18: 673-695. 

Chesneau, C., Tomy, L. and Jose, M. (2020c). Wrapped modified Lindley distribution. Journal of Statistics and Management 
Systems, DOI: 10.1080/09720510.2020.1796313. 

Chesneau, C., Tomy, L. and Gillariose, J. (2021). On a new distribution based on the arccosine function, Arabian Journal of 
Mathematics, 10(3): https://doi.org/10.1007/s40065-021-00337-x. 

Hinkley, D. (1977). On quick choice of power transformations. Applied Statistics, 26: 67-69. 

Klein, J. P. and Moeschberger, M. L. (2006). Survival Analysis: Techniques for Censored and Truncated Data; Springer: Berlin/ 
Heidelberg, Germany. 

Lindley, D. V. (1958). Fiducial distributions and Bayes theorem. Journal of the Royal Statistical Society, 20: 102-107. 

Murthy, D. N. P., Xie, M. and Jiang, R. (2004). Weibull Models, Wiley series in probability and statistics, John Wiley & Sons, 
NJ. 


Chapter 3 


An Extension of the Weibull Distribution 
via Alpha Logarithmic G Family with 
Associated Quantile Regression Modeling 
and Applications 

Yunus Akdogan,' Kadir Karakaya,' Mustafa  Korkmaz,2* Fatih Sahin' and Air Geng? 


1. Introduction 


The lifetime models are used to model data defined on (0, ~) and generally prefered in fields such as 
reliability engineering and life data analysis. In recent years, the need for data modeling has increased. A 
lot of distributions have been introduced recently by Alizadeh et al. (2020), Korkmaz (2020), Tanis et al. 
(2021), Rasekhi et al. (2019), and others. Also, several distribution families have been introduced using 
transformations of well-known existing distribution functions recently. For instance, the exponentiated 
family introduced by Mudholkar and Srivastava (1993) is presented as, 


F(x) = Fy (x)%, a> 0, 


where, Fo (x) is the cumulative distribution function (cdf) of the baseline model. Mahdavi and Kundu (2017) 
introduced another generalized family called a—power family. This family is specialized for the Exponential, 
Pareto and Weibull distribution in the studies of Mahdavi and Kundu (2017), Kinaci et al. (2019) and Nassar 
et al. (2017) respectively. 

The a—logarithmic family is defined by Karakaya et al. (2017) and the exponential distribution is 
selected as a baseline distribution. The a—logarithmic family is as follows: Let Fy (x) be the cdf of a baseline 
random variable , then the cdf of the a—logarithmic transformation for x € R, is given by, 


_ log(.+ a{Fy(x)}) 


a) log(+a) 


(1) 
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where, a > — 1. This family is known as ALT and is easily seen that when a — 0 then Fy, 7 (x) > F(x). The 
probability distribution function (pdf) of the ALT family is given as follows: 


a fo(x) 
log(1+ a) +aFy(x)) 


Farr (= (2) 
where, /f) is the pdf of the baseline distribution. In reliability research, the Weibull distribution is one of 
the most used lifetime distributions.This distribution is used to model data in fields such as medicine, 
biology, physics and economics. The Weibull distribution is lacking in data modeling for some applications. 
To overcome this, many new distributions based on the Weibull distribution have been proposed. In this 
study, we aim to bring a new perspective to the Weibull distribution and it is the baseline distribution is 
selected in the family of ALT introduced by Karakaya et al. (2017). This paper is organized as follows. In 
Section 2, some distributional properties of the new distribution are examined. The inference of the new 
model is examined by five different estimation methods discussed in Section 3. In Section 4, extensive Monte 
Carlo simulation studies are performed to investigate the efficiency of the five estimators. On the basis of the 
new model, a new quantile regression model is constructed, and model parameters are estimated using the 
maximum likelihood method in Section 5. Finally, Section 6 shows two data sets of the proposed model for 
modeling practical data sets. 


2. Alpha logarithmic weibull distribution 


In this section, the ALT method is applied to Weibull distribution and is called a—logarithmic Weibull 
distribution (ALWD). Let X follow the Weibull distribution with parameters £, 9>0. Recall that cdf associated 


with is given by, 
0 
Fy (x) -1-e{-(3) | (3) 


Using the Weibull distribution as F(x) in (3), the cdf and pdf of ALWD are obtained, respectively, by, 


0, x <0 


0 
F(x; 2) = log ieaft-en{ -(5] | (4) 


, x>0 
2 


42) “(l) 
rival cel-el-(3)] 


where, a>—1, 8 >0 and 0> 0 are parameters, 5 = (a, 6, 0) and I, (-) is the indicator function on set A. When 
the random variable X has an ALWD with pdf given in (5), it is briefly shown as ALWD (5). The ALWD 
includes some special-sub distributions as follows: 


and 


I(x; 2) = 


Tp +(Q), (5) 


¢ The exponential distribution with scale parameter f (when 6 = | and a= 0) 
¢ The Weibull distribution with scale parameter / and shape parameter 0 (when a = 0) 
¢ The gamma distribution with shape parameter 2 and scale parameter / (when 0 = 1 and a= 0) 
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¢ The generalized gamma distribution with shape parameters 2, a and scale parameter A (when a = 0). 
¢ The Rayleigh distribution with shape parameters 2, a and scale parameter 1 = oV2 (when a = 0). 


The quantile function of the ALWD can be easily obtained by inverting its cdf. Hence, th quantile 


function is obtained by, 
1/0 
= at+l1-(l+a)" 
x, (2) = Al be ; ) } F 


where, 0 <u <1. The hazard rate function (hrf) for ALWD is obtained by, 


ala) ora) 
A\B p 


h(x;Z) = 


0 
log| 1+a-aexp [4] log(1+a) 


1 


5 ‘ 
a exp (4) -a-l 


The pdf plot and hrf plot of ALWD are given in Figure | for some choices of parameter values a, / and 0. 
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Figure 1: Pdf and hrf plots for different parameter values a, # and 0. 
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It is concluded from Figure | that the pdf of the ALWD is increasing, decreasing and unimodal. It is also 
concluded from Figure | that the hrf is increasing when a < 0 and decreasing when a < 0. It is also seen that 
the hrf is constant when 0= 1, a > 0. 

The rth moment of ALWD for a < 0 is given as follows, 


E(x") -F or[snjof Fi 1} 
log(i+@) 0 l+a a 


where, P(z,5,v) = 9 (v + n)* z, is the Lerch function. The expected value is, 


Fa=eor r[pr) ern i). 


Note that when 0 = 1, a > 0 then E(X) — f and a — o~ then E(X) — 0. Moreover when 6 = 1, a — 0 then 
Var(X) > f° and a — « then Var(X) > 0. 


2.1 Order statistics 


The order statistics of the ALWD (4) distribution provided some interesting findings. Let be a random sample 
from ALWD (&) distribution and X/,) < X(9) < --- < X(,) denote the corresponding order statistics. The cdf and 
pdf of rth order statistics, X/,, are given by, 


Fy, (®)= ae ol", : te re(t-en|(-F I] 


log(1+a) 


k+l 


k=r 1=0 


and 


- ao n-r k n-r 
Li, 0) TEE emery oh ” [ k | | x : 
7 1+a@| l—exp (-=) 


2 iea{t-ow(-5)} 


log(1+a) 


rt+k-1 


x 


respectively, where B(.,-) is the beta function and r = 1,2,..., 7. It is obviously seen that the cdf and pdf of 
X= min{X), Xy,...,X,} and Xi, = max{X), Xp,..., X,} are obtained, for r= 1 and r = n respectively. 


2.2 Stochastic ordering 


Shaked and Shantikumar (2007) obtained some stochastic ordering with many applications. Stochastic orders 
are an important criterion for the comparative determination of random variables. We provide some essential 
definitions. A random variable X is shown to be smaller than a random variableY in the 


¢ The stochastic order (X <,, Y) if Fy (x) = Fy (x). 
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¢ The mean residual life order (X<,,,, Y) if my (x) = my (y). 


* The likelihood ratio order (X <,, Y) if 7 if decreases in x. 
y(x 
¢ The hazard rate order (X <,,. Y) if hy(x) = hy(x). 
Theorem 1 /f X~ ALWD(a,,f,0) and Y ~ ALWD(a,,8,0) and a, > A, then X <,, ¥. 


Corollary 2.1 [fX~ ALWD(a,,f,0) and Y~ ALWD(a,f,0) and a, > a, then > X <;,Y>X< 
For any a> 0 = and x € (0,0) get the W (x) the likelihood ratio of ALWD is given by, 


0 
ca) oe [eo(-5) } 
= NN 
cote) tv [eo(-3) } 


Taking the derivative with respect to x, 


YoX<,,¥ 


mrl 


W(x) = 


0 


0 0 
a0{ 2) en{ 2) dog +aett+ a3) (0 +4) 
W"(x) p 


9\2 
va ay rae) is 
2 yey ae 
B 


>0 


for > G, — (, — a) is smaller than zero. So W' (x) < 0 = when a, — a, is taken. W (x) is a decreasing 
function in x. The proof is thus completed. 
Theorem 2.1 shows that the ALWDs are ordered for stochastic orderings. 


3. Parameter estimation 


In this section, several methods of estimations are examined to estimate the three unknown parameters 
of ALWD. We discuss the maximum likelihood (ML), least-squares (LS), weighted least squares (WLS), 
Anderson-Darling (AD), and Cramer-von Mises (CvM) methods of estimation. Let Xj, X),..., X,, be a random 
sample from ALWD. X(1) < X(2)< +++ <X( symbolizes the corresponding order statistics. Also, represents the 
observed value of X(j), i= 1,2,..., 2. Then the log-likelihood function is given by, 


C(&) =n log(a) + n log(@) —n log(f) — Doe a log (log(1 + a)) 


j=l 


yr 8 1ea{t-eo|-(2) | (6) 


ML estimators of © are obtained as, 


ty 


{e(5)}. (7) 


= Max 
st (a0, B)e([-1,+0]-{0})x(0,+00)(0,+00) 
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Then using (4), the following equation can be written as, 


0 
Xs 
log| 1+a ion (59 
B 


F(x, ,)= » J=12,---,n. 
Gu) log(1+a@) i 


Let us define the following four functions which are utilized to get the other estimates: 
2 


215(8)=)[ Foyis)-4 ) 


n 9 ‘ 2 
Oey ee” [Fos2)-—4) 


= J(n-Jjt+l) n+l 
i 2 
1 2j-1 
5) =—+ ) F(x )32)- ; 
Qcovu ( ) 12n a (Xj) ) mn 


and 


n 


O4p(=) = “nS {Qj-1 log(F(;))} +2) {log (IFO) 


jal jal 


Then, the LS, WLS, CvM and AD estimator of © are given, respectively, by 


ise = arg min {Ors (£)}, (8) 
(@,0,B )e([—1,+0 ]-{ 0} )x( 0,+00 )x( 0,400 ) 

wise = arg min {Ors (= )}, (9) 
(@,0,B )e([—1,+2 ]-{ 0} )x(0,-+90 )x(0,+90 ) 

z CvME = arg min { Qom ( = ) } > (10) 
(a0, )E([—1,+2 ]-{ 0} )x( 0,400 )x(0,+00 ) 

E ape arg min {Quan (£ )}. (11) 


7 (20,8 )e([—1,+0 ]-{ 0} )x( 0,+00 )x(0,+00) 

All optimization problems given in Equations (7, 8-11) can be carried out with some numerical methods 
such as BFGS or Nelder-Mead. 
4. Simulation study 


Consider, NV = 1000 trials of size n = 50,55,...,1000 from the ALWD with true parameter values a = 1, 
fB=10 and 6=2. All estimates are obtained via constrOptim routine in the R. The bias and mean square 
errors (MSEs) are computed by (for = = a, £, 0) 


, 1 a nee 
Bias (n=) (E)- £0) 


j=l 
and 
_ il No An? 
MSEz(n)=——)) _(2)-ED) 


respectively. The results are served up in Figures 2-4. From Figures 2-4 it can be said that estimates are 
biased but asymptotically unbiased. Also, as n increases, the bias and MSE of the estimators decrease as 
expected. 
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Figure 2: The empirical means, bias and MSEs of the parameter a. 
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Figure 4: The empirical means, bias and MSEs of the parameter 0. 


5. Alwd quantile regression model 


When the data set is defined as the positive domain, the gamma regression model, proposed by Cuervo 
(2001), comes to mind to construct the linear relation between the response variable and independent variables 
(covariates, regressors). The gamma regression aims to model the conditional mean of a response variable via 
the covariates. On re-parameterizing the ordinary gamma distribution, the following pdf is obtained: 


a 


1 a ~ay 
IY, a, LL) = yl (a) = Tate , ea 0, ( 12) 
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where, a > 0 is the shape parameter and “1 > 0 is the expected value of the model. Now, using the appropriate 
link function the covariates can be linked to the mean of the model. In the gamma regression model, the 
regression structure is given by, 


gu) =x; 7"; 
where, y = (79, Yy5 Yo.+++ Yp)’ and x; = (1, Xj, Xj, Xj35---» Xjp) are the unknown regression parameter vector and 
known 7 vector of the covariates. The function g(z) is the link function. When the response variable has 
a skewed distribution or outliers, the gamma regression can be affected by these situations. So, the robust 
regression models will be more suitable than mean response regression model. For this reason, the quantile 


regression Koenker and Bassett (1978) can be seen as a good robust alternative model. Now, as an alternative 
to gamma mean response regression model, we propose the ALWD quantile regression model. Using the qf 


-1/0 
of the ALWD, let «=x, (Z) and B = —u ue (® es) } and be in (5). The following pdf 
a 


is obtained based on this re-parametrization: 


col ng et J) (art-aren)) 
a 


a 


gy, a0, uu) = (13) 


0 
i) 
Hw’ | 1+al1 [stare }’ log(1+ a) 


a 


where, the a, 9 > 0, w > 0, is the quantile parameter, and w is known. The random variable Y is denoted 
by Y ~ ALWD(a,6,u,u). This re-parametrization, in the next step, will link the covariates to the quantile of 
the ALWD random variable. Then, we use the log-link function to link the covariates to quantiles of the 
ALWD(a,0,4,u) model via for log(u;) = x; y’ for i= 1,2...n. If the parameter is equal to 0.5, the covariates are 
linked to the conditional median of the response variable. 


5.1 MLE method for the parameters of ALWD quantile regression model 


Based on the ML estimates method, the log-likelihood function of the ALWD quantile regression model is 
given by (for ¥ = (4,6,y)’ and wu; = exp(x; y’)), 


“| g(t 


a } n n 
+(0 ae logy; -0)) “loge 


log(1+a@) 


(¥) =nlog 


(14) 


o 
0 ») 
at+l-(l+a)" n | Vj n at+l-(lt+a)" t 
+ log | ———__—_ —|- log |1+a@|1+a)1 
e(erctea lyn [2] ye 


Maximizing (14), the ML estimators can be obtained directly via the maxLik function in the R software. 
The standard errors of the estimated parameters can be obtained with this function also. 
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5.2 Model checking 


After model fitting, the residual analysis plays an important role in model fitting. In order to do this, the 
randomized quantile residuals (rqrs) are focused on, Dunn and Smyth (1996). For i= 1,..., 7, the i” randomized 
quantile residual is defined by, 


A= DB" [GO a, 0, A; wl, 


where, the G(,a,0,u,u) is the cdf of the re-parameterized ALWD and @" (x) is the quantile function of the 
standard normal distribution. If the model is valid then the rqrs have a N(0,1) distribution. 


6. Real data applications 


In this section, a practical example is considered to observe the modeling capability of the ALWD. 
The number of failures for the air conditioning system of jet airplanes was examined. This data set was 
analyzed by Cordeiro and Lemonte (2011). Under ML estimates, we fit the ALWD to jet airplanes data 
and compare the ALWD with exponential (E), Weibull (W), gamma (G) and Rayleigh (R) based on the 
estimated log-likelihood values (-2¢), Akaike information criteria (AIC), Bayesian information criterion 
(BIC), Kolmogorov-Smirnov (K-S) goodness of fit statistic and related p-value. All calculations mentioned 
above are obtained by the BFGS command in R function optim. From Table 1, the ALWD can be selected as 
the best distribution because it has the lowest values of AIC, BIC, K-S and £. These results show that the new 
proposed model has competitive power. 

Second, we point out the applicability of the ALWD quantile regression model as an alternative to the 
gamma regression model. The used data set consists of the environment indicator of the Better Life Index 
(BLI values of the OECD countries as well as Brazil, Russia and South Africa. The data set can be extracted 
from https://stats.oecd.org/index.aspx?DataSetCode=BLI2017. 

The idea is to set the linear relation between water quality (WQ) and air pollution (AP). The regression 
structure has been set for 7; given by, 


logit(u;) = yo + yy; forj = 1,...,39 


Table 2 indicates the ML estimates and their standard errors (SEs), and model selection criteria for the 
ALWD quantile and gamma regressions models. 

From Table 2, for w= 0.5 quantile level, all coefficients of the models are statistically significant. Hence, 
the covariate has affected the response variable which is statistically significant at the usual significance 
levels. This affection has been seen as an opposite indication. Moreover, the log-likelihood values of the 


Table 1. MLEs, ¢ and goodness of fit statistics for the jet airplanes data (p-value given in (.)). 


Model 

Parameters ALWD E W G R 
20 2070.0236 | 2076.4966 | 2073.5022 | 2075.2246 | 2382.5502 
AIC 2076.0237 | 2078.4967 | 2077.5023 | 2079.2247 | 2384.5504 
BIC 2085.7330 | 2081.7332 | 2083.9752 | 2085.6976 | 2387.7868 
K-S 0.0547 0.0845 0.0572 0.0703 0.4003 

(0.6282) (0.1368) (0.5703) (0.3114) (0.0000) 
a 2.0025 0.9049 
B 74.6601 0.9109 0.0098 100.1541 
d 0.6130 0.0108 87.7565 


130 G Families of Probability Distributions: Theory and Practices 


Table 2. The results of ALWD quantile and Gamma regression models with model selection criteria. 


ALWD Gamma 
Parameters Estimate SE p-value Estimate SE p-value 
Yo —0.0631 0.0330 0.0555 —0.0392 0.0459 0.3930 
yy —0.0085 0.0025 0.0005 —0.0118 0.0031 0.0001 
a —0.9277 0.3222 0.0039 76.6523 0.7689 < 0.0001 
0 9.0136 2.9760 0.0025 
40.2193 37.1693 
AIC —72.4385 —68.3385 
BIC 65.7843 —63.3478 


randomize resiquls for ALPW quantile regression 


randomize residuls for Gamma regression regression 


2-10 1 2 2 -1 0 1 2 


Normal Quantiles Normal Quantiies 
Figure 5: The Q-Q plot of the rqrs based on the fitted data set. 


proposed quantile regression model are bigger than those of the gamma regression model with all the smallest 
likelihood-based statistics. These inferences are also supported by the half-normal plots for the Q-Q plots of 
the rqrs with simulated envelopes shown in Figure 5. 


7. Conclusions 


A new lifetime distribution based on the Weibull is proposed. The new distribution is derived from the ALT 
method. Many distributional properties of the new distribution have been studied. The unknown parameters 
of the new distribution have been estimated by five methods. A comprehensive simulation study was 
conducted to observe the performances of the five estimators. A new regression model has been proposed as 
an alternative to the Gamma regression model. Its usefulness in data modeling was shown via two applications 
to the real and regression data sets. 
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Chapter 9 


The Topp-Leone-G Power Series 
Distribution 

Its Properties and Applications 

Laba Handique,** Subrata Chakraborty and M Masoom Ali 


MS Classification: 60E05; 62E15; 62F10. 


1. Introduction 


The Topp-Leone distribution is commonly used in many applied problems, particularly in lifetime data 
analysis that has attracted various statisticians as an alternative to the Beta distribution. A generalization of 
this distribution is the Topp-Leone-G family of distribution. The distribution function of the Topp-Leone-G 
family is given by Ali et al. (2016) as, 


F™S (x; @) = G(x)" [2-G(x)]° ; xeR, a>O0 (1) 
The probability density function (pdf) of the corresponding (1) is given by, 
f™ (xa) = 2a g(x) G(x) G(x)*"'[2-G(x)]*" (2) 


where, g(x) and G(x) =1-—G(x) are the probability density function (pdf) and the survival function (sf) of 
the baseline distribution. 
Roozegar and Nadaraja (2017) defined the Topp-Leone power series distribution with sf and pdf, 


@(A[l-Gx;a, B)]) - fe 4,0, py <8 G2 OD @'( A[1-G(x;@, B)] ) 


aay D(A) 


where, g(x;a, 8) and G(x;a@, Z) are the pdf and cdf of the two parameter Topp-Leone distribution ®(,)and 
@‘-) are defined below. 
In this article, we introduce a new extension of the Topp-Leone-Power series family of distributions by 
compounding Topp-Leone-G by power series distributions, to get the Topp-Leone-G power series distribution. 
Given N = 2, let_X/s(i= 1,2,..., 2) be independent and identically distributed (iid) random variables from 
the Topp-Leone-G family of distributions whose cdf is given by eq. (1). Consider N to be a discrete random 
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variable independent of X/s from a power series distribution, truncated at zero, with the probability mass 
function given by, 
a Ae 
P(N =n)=p,=2 : n=1, 2: 3:;.. 3 
(N=n)=2,= 35 3) 


where, a, > 0 depends only on n and @(1) = Ya, A", 2 €(0,s)(s can be +o0) such that H(A) is finite and 


its first, second and third derivatives exist and are denoted by &(-), &'(-) and }'(-). Table 1 shows some 


useful quantities including a,, P(A), BA) and 6"), (A) and @'(A) for some power series distributions 
(truncated at zero) such as Poisson, geometric, logarithmic, binomial and negative binomial distributions. 
Suppose that the failure time of each sub system has the cumulative distribution function (cdf) 
equation (1). Let Y, denote the failure time of the i” subsystem and X denote the time to failure of the 
first out of the V functioning subsystems that is ¥ = min {Y,,Y,,...,Y,,}. Then the conditional cdf of X given 
Nis F(x/N=n)=1-Pr(X¥ >x/N=n)=1-P(Y, > x)" =1-[l-F™*(x;a)]’. Observe that X/N = n is a 
Kumaraswamy Tope-Leone-G (KwTLG) random variable with parameters a and n which we symbolized by 
KwTLG(a,n). So, the unconditional cdf of X (for x > 0) can be expressed as, 
2a,4A" 


FQ)= FO" (a0) =! [I-F™9(x;a)]" )=1 


P(Afl— F"* (x; a)]) 
P(A) 


(4) 
This is the proposed Topp Leone-G power series (in short, TLGPS(a,/;Q), Q is the parameter of 
baseline distribution G.) distribution. The corresponding pdf of the TLGPS(a,/,;Q) family is given by, 
Af" (x;a) O'( ALL-F™ (x;0)] ) 
@ (A) 
The sf and hazard rate functions (hrf) of the TLGPS(a,/;Q) distribution are, 


fs fo" aA (5) 


@( AfI-F™*(x;,a)]) and ATS (x A;Q) = AL (a) @'( ApL-F™(x,a)]) 


ees % AD = 
(x; a,A;Q) @ (A) P( AlL-F™*(x;@)]) 


where, ®(A) and &‘.) are defined above. 
For G(x) =x, the TLGPS(a,/;Q) reduces to TLPS(a,A). 

TLGPS(a,/;Q) class of distributions contains several important families, including the Topp-Leone-G 
Poisson, Topp-Leone-G geometric, Topp-Leone-G logarithmic, Topp-Leone-G binomial and Topp-Leone-G 
negative binomial distributions. We derive some mathematical properties of this class. 


Proposition 1: The Kw7LG(a,c) distribution with parameters a and c is a limiting distribution of the TLGPS 
distribution. 


Table 1: Useful quantities for some power series distributions. 


Distribution @ (A) @'(A) @"(A) @' (A) a s 


Poisson e* 1 e e log(1+ A) (ce a) 
Geometric Ad-Ay! — (l-ay? 2(l-ay* Altay! 1 1 
Logarithmic -—log(I-2) (1-A)' (l-ay? l-e”* n! 1 
1 
Binomial (1+ A)"-1 m(L+A)"! m(m-1)(1+ ay"? (A-1)”"-1 (") 00 
n 
Vn 

‘ 7 mia m(m+2A-1) on n-1 

Negative m ml 2m m2 j 1 
(1-A) (1-A) Re (1-A) ley m-1 

Binomial 
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P(A[l- F™ (x;a))]) 
D(A) 


i, lim.F?* GeaA) = lim i =1-f1-F™°(x;a)|° 
x>0° x>0° 


where,c =min{n€ N:a, > 0}. 


Proposition 2: The pdf of the TLGPS distribution can be written as an infinite mixture of the pdf of the 
KwTLG(a,n) distribution with parameters a and n. 


Proof: By using eq. (5), we have, 


Fos aAy= {na AN fT (x; a)fl- f™ (iy: a)" '} =D ph an), (6) 


es 

D(A) ‘a 

since, 0'(A)= S°na, A”! where, ~ a,ny=n f'S(x;a) 1-F™9 (x: a)]""|, is the pdf of Kw7LG(a,n) 
n 


and p,, is defined abeue: kk 
Using the series expansion (1—z)* = y | (—z)’ we can, therefore, write, 
J 


j=0 
o n—l 
FPS (aA) =D) DP, Hf esa) [F™ say (7) 
n=l j=0 
55 oa a . a tel wd . 
= Y g od Mae Ge a)| jt =, eae oT aed (xa) |’ 1 , (8) 
n=l j=-0 J n=1 j=0 
j n—-1 ; 1 : : 
where, Mp = n(—l) . e Sn. = Pn H; and Sn, = Gj +1) Sn.j . 
? J +1 


2. Special cases of the TLGPS distribution 


Here, we study basic distributional properties of the Topp-Leone-G Poisson (TLGP), Topp-Leone-G geometric 
(TLGG), Topp-Leone-G logarithmic (TLGL), Topp-Leone-G binomial (TLGB) and Topp-Leone-G negative 
binomial (TLGNB) distributions as special cases of the TLGPS distribution. Table 2 expresses the pdf, sf, 
hrf and reversed hazard rate function (rhrf) of the TLGP, TLGG, TLGL, TLGB and TLGNB distributions. 
g™S(x;a) and G™“(x;a) denote the pdf and cdf of the Topp-Leone-G family of distributions, respectively. 
To illustrate the flexibility of the distributions, graphs of the pdf and hazard rate function for some selected 
values of the parameters are presented in Figures | and 2. 
Special case of the TLGP and TLGG families of distributions with their pdf and cdf are listed along with 

their representative graphs for some selected values of the parameters are presented in Figures | to 4. 

> Topp-Leone-Weibull Poisson (TLWP) distribution 
Let the base line distribution be Weibull (Weibull, 1951) with parameters 6 > 0 and @> 0 having pdf and cdf, 
g(x) =BOx*'e%* and G(x) =1-e%" , x>0, respectively. Then we get the pdf and hrf of TLWP(,, «, B, 8) 
distribution, respectively, as, 


2Aa BOx* ce? (1—e 78") exp [A 1-(1-e 2" )* 3] 
e* -1 : 


SO (51, a, 8,8) = 


2Aa BOx*! e* (1c 7") exp a (=e 3° HL 
exp [A {1-(l-e 74" )*}]- 


and h™? (x;A,a, B,0) = 
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Table 2: pdf, sf, hrf and rhrf of the five special distributions. 


Distribution pdf sf 
TLGP Ag" (xa) exp [A{1-G™" (x3) 3] exp [A{1-G"*(x;a@)}]-1 
e’-1 a | 
TLGG (1-2) g™° Ga) (l-A)[1-G"* (x3) 
[1-Afl-G""(x:a)}]? 1-A[I-G"* (x;@)] 
TLGL -Ag"S (x;a) Infl-A{1-G™* (x;a@)}] 
{In(l—A)}[1-AM1-G"S (x; a)}] In(l—A) 
TLG ;.. _ ATG... m 
TLGB mee [1+ 491-G™ Osa)}]"" ea eu 1 
TLGNB 0-4)" g™° Gra) [I-G™(x,a)]"" [(l— 4) {1-G™° (x; a)}]" 
[1-Afl-G™(x;a@)}]" [1-2{1-G™°5a)}]" 
Distribution hrf rhrf 
TLGP Ag"? (x;a@) exp [Af1-G"S(x,a)}] Ag" (x:a@) exp [A 1-G"S(x,@)}] 
exp [A{I-G"°(x;a@)}]-1 e* -exp [A{1-G"*(x,a)}] 
TLGG g(a) =A gs" (5a) | 
[1-Afl-G" (x,a)}]1-G" (x3 @)} 1-Afl-G™ (x; @)] 
TLGL -Ag"™ (x,a)[1-AN1-G™ (xsayyT! Ag" (x;a@)In(1- A) 
Inf1-2 {1-G"*(x;@)}] [1-42 f1-G"8 (x,a)}](In[ 1-2 1-G™9 (x; @)}]) 
TLGB mag"? (xa)[1+A1-G™ (xsa@)}]"" mag"? (x;a)[1+A1-G" (xsa@)}]"" 
[1+ 4{1-G™"5(x;a)}]” -1 (1+ A)” -[14+211-G"8(x3@)}]” 
TLGNB m g™*(x;a)[1-G™(x,a)]" m (1-A)" g™°(x;a)[1-G"9 (x;a))""-A1-G™ (xa) } 
[1-A{1-G"*(x;a)}] [1+A{1-G™(x;a)}]" -[1-G™ (x3@)]" (1-A)” 


> Taking 6 =1, in TLWP(A, a, f, 0) we get the TLEP(A, a, f) with pdf and hrf, respectively, as, 
2Aa@ pe?* (1—e7*)*" exp [A {l-(l-e 77)" }] 


e-1 


SE" (GA, a, B) = 


2Aa Be?** (1—-e79*)*" exp [Af1-(l-e77*)* 3] 
exp [A {l-(l-e°"*)*}]-1 


and h"™?(x;A,a, B) = 


Obviously a large number of particular cases of TLGP distributions can be generated by assuming 
different baseline distributions. 

Special cases of the TLGG family of distributions and list of their main distributional characteristics 
follow: 


> Topp-Leone-Weibull Geometric (TLWG) distribution 
Let the baseline distribution be Weibull (Weibull, 1951) with parameters £ > 0 and @ > 0 having pdf 
and cdf g(x) =BOx°!e%* and G(x) =1-e%” ,x > 0, respectively. Then we get the pdf and hrf of the 
TLWG(A, a, f, 9) distribution respectively as, 


(1-A) 2a BOx*! e7™ (1-2 28" 
N-Ailetiee=" 7 


’ 


5 ali GEA, a, B, 0) — 


2a Boxe Cia er ye" 


dh™™S (x; A, a, 8,0) = ———_—__—_—_—_——._——_- 
- ae [j=aiatiae Py l=Gse "7 
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Figure 1: pdf plots of the TLEP and TLWP distribution 
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Figure 2: hrf plots of the TLEP and TLWP distribution 


> Taking 0= 1 in TLWG(, a, £, 0) we get the TLEG(,, a, f, @) with pdf and hrf, respectively, as, 


(l-A)2ap e 2Ph* d=." 
[1-afi-(l-e 74") 3? 
2aBp e2hx i-— 7" 

[1-4 f= =e "7 4) (l—-d—e2*)7] ‘ 


youu (x;A, a, B) = 


2 


and h'“9(x;A,a, B)= 
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Figure 3: pdf plots of the TLEG and TLWG distribution 
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Figure 4: hrf plots of the TLEG and TLWG distribution 


3. Statistical properties of TLGPS(a, 4) 


3.1 Moment generating function 


The moment generating function of TLGPS(a, 4) family can be easily expressed in terms of those of the 
exponentiated TLG(a) distribution using the results of proposition 2. For example using equation (8) it can 


be seen that, 


M,(s)=E[e™ ]= fesse 


n- n- 


Ey SF (xa) "de =>) d 


1 
nj 
d n=l j=0 


! t sx d TLG j+1 
oni Je LF Gay" de 
: J dx 


1 
n=l j=0 


nyJ 


&' M,(s), where M,(s)is the mgf of a TLG (a) distribution 


138 G Families of Probability Distributions: Theory and Practices 


3.2 Power weighted moments 


Greenwood et al. (1979) first proposed the idea of the probability of weighted moments (PWMs) as the 
expectations of certain functions of a random variable whose mean exists. The (p, g,r)"" PWM of X is defined 


by Py ge = f2?LF@U-FOI f@ae.” 


From equation (7) the s” moment of X can be written as, 


coo n-l 


BUK)=| x Sb ae a)[F™ (xa)! d= YET, po 


0 n=l j=0 n=l j=0 
where,T, |. = ee {F™S (ga) f1-F™S (xsa)}" f™S(x3a@)dx is the PWM of TLG(a) distribution. 


Thus, the moments of the TLGPS(a, 4) can be expressed in terms of the PWMs of TLG(a). 


3.3 Order statistics 


Consider a random sample X), X,---» X,,, from any TLGPS(a, 4) istribution and let X;.,, denote their i” order 
statistic. The pdf of X;.,, can be expressed as, 


n! TLGPS TLGPS a 5 TLGPS n-i 
Fin @) = G-Din—! pit (x) FO) tl FO} 
(9) 
w -1 TLGPS TLGPS m+i-l 
apy ("7 rmroenmeon 


Now the cdf F;,.,(x) of f;.,,(x) can be expressed as E Toray = —_“1F@)] aa) 
a+ IX 


! n-i =7 yy : 
F.,, (x) = ———) ” \Orer Px) m+ti 


(i-D!(n-a)! im om ) mti 


sa 2a (ren c ola F econ 


~G—-Din—)!& | m ) msi @ (A) 


&(ALL-F™(x:a)]) 


Taking z =1 we can write, 
@ (A) 
n! = Tl ( 1)” mt+i 
F,. a 
i= GF wesyee ". \oe 
niz' = (i-n),, T(m+i) 2” 


~G-Din-)!S Fim+i+l) m! 


_ niz! = (=n), (0, 2" 
iG-Din-)! (+0), sm! 


i 


n'z 


Gea 


F (-n+i, i:i+1;z) [Since for m>n-—i the summand is zero] 


n : 
-(7}= 7, (-n+i, i:i+1;z), forall 1<i<nand x20, where, 
i 
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(es) k . 
a), (b), Z T(ati 
7 Fi (a, b:c3z)= a @i Oe ® 5, the Gauss hypergeometric function and (a), = ( ) 


k=0 (C)x k! T(i) 


denotes the Pochhammer symbol with the convention that (0) = 1. Moreover, F,.,(x) = 2”. 


4. Estimation 


In this section, estimation of the parameters of the TLGPS(a, 1; 2) where Q is the parameter of baseline 
distribution G is conducted using the maximum likelihood method. Let x = (x1, x>,..., x,,) be a random sample 
of size n from the TLGPS(a, 4; Q), distribution with parameter vector p = (a, 4, Q), then the log-likelihood 
function for p is given by, 


(= U(p) =2nlog(a A)—nlog[® (A)]+ Vlog g(x,)]+ VloglG(x,)]+ (@-)> loglG(x,)] 


i=l i=l i=l 


+(a-NY logl2-G(x,)]+ log ®'(af1-G(x,)"{2-G)}] ). 


The MLEs are obtained by maximizing the log-likelihood function numerically by using an available 
function from R. 

The asymptotic variance-covariance matrix of the MLEs of parameters can be obtained by inverting the 
Fisher information matrix I(p) which can be derived using the second partial derivatives of the log-likelihood 
function with respect to each parameter. The ij” elements of I,(p) are given by I, =— £0 Up) op; Op,), 
ij =1,...,2+q, where g is the number of parameters in the G-family. 

The exact evaluation of the above expectations may be cumbersome. In practice one can estimate I,(p) 
by the observed Fisher’s information matrix I,(p) = (/, ;) defined as, 

i,,*Cd7Up)/0p,dp,)_.. ib f=b.24¢. 


nN 


Using the general theory of MLEs under some regularity conditions on the parameters as n — oo the 
asymptotic distribution of ni (p— p) is N;(0,V,) whereV, =(v; )= 1 '(p). The asymptotic behaviour 


remains valid if V,, is replaced by V, =]! (). Using this result large sample standard errors of the j™ 


parameter p; are given by , Ip ae 


5. Applications 


In this section, we use two real data sets to show that the TLEP distribution can be a better model than 
the ones based on exponential (Exp), moment exponential (ME), Marshall-Olkin exponential (MO-E) 
(Marshall and Olkin, 1997), generalized Marshall-Olkin exponential (GMO-E) (Jayakumar and Mathew, 
2008), Kumaraswamy exponential (Kw-E) (Cordeiro and de Castro, 2011), Beta exponential (BE) (Eugene 
et al., 2002), Marshall-Olkin Kumaraswamy exponential (MOKw-E) (Handique et al., 2017), Kumaraswamy 
Marshall-Olkin exponential (KwMO-E) (Alizadeh et al., 2015), Beta Poisson exponential and Kumaraswamy 
Poisson exponential (KwP-E) (Chakraborty et al., 2022) distributions. The first data set represents the 
survival times (in days) of 72 guinea pigs infected with virulent tubercle bacilli, observed and reported by 
Bjerkedal (1960). The second failure time data set is about the relief times (in minutes) of patients receiving 
an analgesic. This data set of twenty (20) observations was reported by Gross and Clark (1975). Here some 
well known model selection criteria namely the AIC, BIC, CAIC and HQIC along with the Kolmogorov- 
Smirnov (K-S) statistics, Anderson-Darling (A) and Cramer von-mises (W) for goodness of fit are used to 
compare the fitted models. The findings are summarized in Tables 4, 5, 6 and 7 with visual comparison fitted 
density and the fitted cdf presented in Figures 6 and 7. 
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TTT plots and Descriptive Statistics for the data sets: 


The TTT plots (see Aarset, 1987) for the data sets Figure 5 clearly indicate that both the data sets have an 
increasing hazard rate. 
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Figure 5: TTT-plots for the Data set I and Data set II. 


Table 3: Descriptive Statistics for the data sets I and II. 


Data sets n Min. Mean Median s.d. Skewness_ Kurtosis 1% Qu. 3% Qu. Max. 


I 72 ~=0.100 1.851 1.560 1.200 1.788 4.157 1.080 2.303 7.000 
II 20 ~~ -1.100 1.900 1.700 0.704 1.592 2.346 1.475 2.050 4.100 


Table 4: MLEs, standard errors, (in parentheses) values for the guinea pigs survival time’s data set I. 


Models A a a b B 
Exp a+ --- --- --- 0.540 
(f) (0.063) 
ME -- a = ue 0.925 
(B) (0.077) 
MO-E --- 8.778 --- --- 1,379 
(a, B) (3.555) (0.193) 
GMO-E 0.179 47.635 --- --- 4.465 
(A,a, B) (0.070) (44.901) (1.327) 
Kw-E --- --- 3.304 1.100 1.037 
(a,b, B) (1.106) (0.764) (0.614) 
B-E --- --- 0.807 3.461 1.331 
(a,b, B) (0.696) (1.003) (0.855) 
MOKw-E --- 0.008 2.716 1.986 0.099 
(a,a,b, B) (0.002) (1.316) (0.784) (0.048) 
KwMO-E --- 0.373 3.478 3.306 0.299 
(a,a,b, B) (0.136) (0.861) (0.779) (1.112) 
BP-E 0.014 --- 3.595 0.724 1.482 
(a,b, A, B) (0.010) (1.031) (1.590) (0.516) 
KwP-E 4.001 --- 3.265 2.658 0.177 
(a,b, A, B) (5.670) (0.991) (1.984) (0.226) 
TL-EP -9.786 0.438 --- --- 0.572 


(A, a, B) (9.422) (0.533) (0.085) 
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Table 5: Log-likelihood, AIC, BIC, CAIC, HQIC, A, W and KS (p-value) values for the guinea pigs survival times data set I. 


Models BIC CAIC HQIC A W KS 
(p-value) 
Exp (f) 234.63 236.91 234.68 235.54 6.53 1.25 0.27 
(0.06) 
ME (f) 210.40 212.68 21045 211.30 1.52 0.25 0.14 
(0.13) 
MO-E (a, f) 214.92 210.53 212.16 1.18 0.17 0.10 
(0.43) 
GMO-E(A, a, f) 210.54 217.38 210.89 213.24 1.02 0.16 0.09 
(0.51) 
Kw-E (a,b, £) 209.42 216.24 209.77 212.12 0.74 0.11 0.08 
(0.50) 
B-E (a,b, B) 207.38 214.22 207.73 210.08 0.98 0.15 0.11 
(0.34) 
MOKw-E(a,a,b, 2) 209.44 218.56 210.04 213.04 0.79 0.12 0.10 
(0.44) 
KwMO-E(a,a,b, 2) 207.82 216.94 20842 211.42 0.61 0.11 0.08 
(0.73) 
BP-E (a,b, A, B) 214.50 206.02 209.02 0.55 0.08 0.09 
(0.81) 
KwP-E (a,b, A, B) 206.63 215.74 207.23 210.26 0.48 0.07 0.09 
(0.79) 
TL-EP (A, a, f) 204.51 211.34 204.86 207.23 0.44 0.05 0.07 
(0.84) 


Table 6: MLEs, standard errors, (in parentheses) values for failure time data set II. 


Models A a a b p 
Exp = = = = 0.526 
(B) (0.117) 
ME = = - = 0.950 
(B) (0.150) 
MO-E _ 54.474 = - 2.316 
(a, B) (35.582) (0.374) 
GMO-E 0.519 89.462 --- --- 3.169 
(A,a,B) (0.256) (66.278) (0.772) 
Kw-E --- --- 83.756 0.568 3.330 
(a,b, B) (42.361) (0.326) (1.188) 
B-E --- --- 81.633 0.542 3.514 
(a,b, B) (120.41) (0.327) (1.410) 
MOKw-E --- 0.133 33.232 0.571 1.669 
(a,a,b, B) (0.332) (57.837) (0.721) (1.814) 
KwMO-E --- 28.868 34.826 0.299 4.899 
(a,a,b, B) (9.146) (22.312) (0.239) (3.176) 
BP-E 1.965 --- 13.396 9.600 0.244 
(a,b, A, B) (0.341) (1.494) (1.091) (0.037) 
KwP-E 5.983 --- 11.837 3.596 0.225 
(a,b, A, B) (1.470) (6.493) (2.392) (0.098) 
TL-EP 2.096 26.116 --- --- 0.866 
(A,a, B) (2.208) (19.801) (0.320) 
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Table 7: Log-likelihood, AIC, BIC, CAIC, HQIC, A, W and KS (p-value) values for the failure time data set II. 


Models AIC BIC CAIC ~ HQIC A WwW KS 
(p-value) 
Exp(f) 67.67 68.67 67.89 67.87 4.60 0.96 0.44 
(0.004) 
ME(f) 54.32 55.31 54.54 54.50 2.76 0.53 0.32 
(0.07) 
MO-E (a, f) 43.51 45.51 44.22 43.90 0.81 0.14 0.18 
(0.55) 
GMO-E(/,a, f) 42.75 45.74 44.25 43.34 0.51 0.08 0.15 
(0.78) 
Kw-E(a, b, 2) 41.78 44.75 43.28 42.32 0.45 0.07 0.14 
(0.86) 
B-E(a,}b, B) 43.48 4645 44.98 44.02 0.70 0.12 0.16 
(0.80) 
MOKw-E(a@,a,b,8) 41.58 45.54 44.25 42.30 0.60 0.11 0.14 
(0.87) 
KwMO-E(a,a,b,8) 42.88 46.84 45.55 43.60 1.08 0.19 0.15 
(0.86) 
BP-E (a,b, A, B) 38.07 42.02 40.73 38.78 0.39 0.06 0.14 
(0.91) 
KwP-E(a,), A, 2) 38.32 42.28 40.98 39.04 0.41 0.05 0.13 
(0.93) 
TL-EP (A, a, 2) 37.65 40.63 39.15 38.23 0.22 ~=0.03 0.10 
(0.97) 
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Figure 6: Plots of the observed histogram and estimated pdf on left and observed ogive and estimated cdf on right data set I. 


From these findings based on the lowest values and different criteria the TLEP is found to be a better 
model than the models Exp, ME, MO-E, GMO-E, Kw-E, B-E, MOKw-E, KwMO-E, BP-E and KwP-E 


for the survival time data set. Thus the proposed distributions provide a comparatively closer fit to these 
data sets. 
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Figure 7: Plots of the observed histogram and estimated pdf on left and observed ogive and estimated cdf on right data set II. 


6. Conclusion 


Topp-Leone Power series distribution is extended by replacing the Topp-Leone distribution by Topp-Leone 
Generalized distribution. Some mathematical properties of the family of distributions are derived. Maximum 
likelihood estimation is discussed. Two real life applications are presented to showcase the advantage of 
members of the class of distributions over some competing distributions. 
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Chapter 10 


Exponentiated Generalized General Class of 
Inverted Distributions 
Estimation and Prediction 


Abeer A EL-Helbawy,* Gannat R AL-Dayian, 
Asmaa M Abd AL-Fattah and Rabab E Abd EL-Kader 


1. Introduction 


There are different types of lifetime data in reliability studies, lifetime testing, human mortality studies, 
engineering modeling, electronic sciences and biological studies. Thus, different shapes of lifetime 
distributions are needed to fit these types of lifetime data. Various extensions and modifications have been 
suggested by researchers to construct new families of distributions which provide more flexibility and 
applicability than the existing distributions. These new families have been used for modeling data in many 
applied areas, i.e., engineering, economics, biological studies, environmental sciences, lifetime analysis, 
finance and insurance. Different methods for constructing, extending and generalizing lifetime distributions 
are discussed and presented. [For more details see Lai (2013)]. 

Some generalized distributions are constructed by adding new parameters to the baseline model which 
are useful in deriving general results that could be applied to special cases to obtain new results. Marshall- 
Olkin generated family with a positive parameter was added to a general survival function by Marshall and 
Olkin (1997), the beta generating family introduced by Eugene et al. (2002). Also, Kumaraswamy generating 
family was presented by Jones (2009) and Exponentiated Generalized was provided by Cordeiro et al. (2013) 
as an extension of the exponentiated type distribution. Transformed-Transformer (T-X) and exponentiated 
(T-X) were considered by Alzaatreh et al. (2013). Yousof et al. (2015) introduced the transmuted EG-G 
family of distributions. The alpha power transformation distribution has recently been proposed by Mahdavi 
and Kundu (2017). Sindi et al. (2017) studied exponentiated general class of distributions, Oluyede et al. 
(2020) obtained the EG power series family of distributions and a new method for generating distributions 
which combines two techniques: the (T-X) and alpha power transformation approaches by Baharith and 
Aljuhani (2021). 

The inverted distributions have a wide range of applications; in problems related to econometrics, 
biological sciences, survey sampling, engineering sciences, medical research, and life testing problems. 
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Many authors focused on the exponentiated distributions and its applications; for example, Nadarajah and 
Kotz (2006), Ali et al. (2007), Silva et al. (2010), Lemonte et al. (2013), Elgarhy and Shawki (2017) and 
Rather and Subramanian (2018). 

Cordeiro et al. (2013) proposed a class of distributions as an extension of the exponentiated type 
distribution which has great flexibility in its tails and can be widely applied in many areas of biology and 
engineering. Given a non-negative continuous random variable X, the cumulative distribution function (cdf) 
of the EG general class (EGGC) of distribution is defined by, 


F(x;0,8) = {1-[1- G@)]*}4, af > 0, (1) 


where a and f are additional shape parameters, the corresponding probability density function (pdf) for (1) 
is given by, 


Axia,f) = oBgooll — GQ) {1 -[1- Ga", af > 0. (2) 


By setting a = | in (1), the exponentiated type distributions is obtained, derived by Gupta et al. (1998); 
further the exponentiated exponential and exponentiated gamma distributions can be derived if is the 
exponential or gamma cdf. If 8 = 1 in (1) and G(x) is the Gumbel or Fréchet cdf, then, one can get the 
exponentiated Gumbel and exponentiated Fréchet distributions, respectively, as defined by Nadarajah and 
Kotz (2006). Thus, the class of distributions (1) extends to both exponentiated type distributions. 

The general class of distributions is important to obtain a general result that could be applied to special 
cases in obtaining new results. It is more flexible in dealing with statistical problems. Many authors focused 
on the generalized and EG distributions and its applications; for example, Oguntunde et al. (2014), Yousof 
et al. (2015), De Andrade et al. (2016), Mustafa et al. (2016) and Sindi et al. (2017). 

Abd EL-Kader (2013) proposed a general class of inverted distributions with a positive domain which 
have many applications in survival analysis. Abd El-Kader general class has the pdf and cdf as given below 


G(t | ) = exp[- A], > 0, (3) 
and 


e(t| A) =A] expl-AM], > 0, (4) 


where, (7) = A(¢,@) 1s a non-negative continuous function of ¢! such that 1(f) — 0- as 
t— 0" and A(t) > 0 as tf > », A(f) is the derivative of A(t) with respect to ¢. 

Burkschat et al. (2003) studied the dual generalized order statistics (dgos) that enable a common 
approach to descending ordered random variables as reversed ordered order statistics, lower record models 
and lower Pfeifer records. 

Let X¢ nm,b)> XQ.nm.b> ++ X(nnmk ben dgos from an absolute cdf with a corresponding pdf. Then, the joint 
pdf has the form, 


n-l n-l 
Fxc.nmky XQ.nmaye+ Mamms) sons) E Oe I"S cyte ts (Xm) () 
jal i=l 


where F! (1) > Xq) ... 2 Xqy 2 F' (0), nEN, k 2 1, mj,....m,_; = m, mER is the parameters such that 
y=k+(n—-ry(mt 1)=1,foralll<r<n. 
The marginal pdf of the r” dgos X (r,n,m,k), 1 <r <n, is given by 


PG) = BFC LO )8u Fah (6) 
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where, ¢.) = Tciis S,AX) = h(x) —h,, 1), x€[0, 1), 


_ 1 x! mz-l 
h, (x)=) m+1 ” , (7) 
—Inx, m=-l. 


2. Exponentiated generalized general class of inverted distributions 


In this section, the new EGGC of inverted distributions is introduced. Using (1)-(3), the cdf and pdf can be 
derived as follows: 


F(¢a,B) = (1 - (1 - explrhoypy, t>0;a48 > 0, (8) 
and 
fAt:a,f)=afg dexpl-AM] {1 — expl-AN]}" (1 = (1 - exp-AOH", > 0; a> 0, (9) 


where g(t) =—A(f) is the first derivative of A(t) with respect to . 
The reliability function (rf), hazard rate function (hrf) and reversed hazard rate function (rhrf) are given, 
respectively, by, 


R(t,B)=1(1-{1 — exp[AO]}°¥; t>0; 4,8 > 0, (10) 


af g(thexpl-A(1)] {1 —expl-A()]}" d - 1- expl-AMI}")"* 


h(t,@, B) = 1-(1- {1-exp[-A()]}7)7" 


t>0;a,8>0. (11) 


and 
rh(t;a,f) = aBg(Hexp[AO] (1 - exp[AO}"d - (-epliO}y', 1>0; 48> 0. (12) 


Table 1: Some resulting distributions of EGGC of inverted distributions 


A(t) Fi The resulting distribution 
ah 
‘a ° ae eee y @ EG-inverted Weibull (a,f,y,0) 
t P 7 [Elbatal and Muhammed (2014)] 


EG- inverted Rayleigh (a,f,7) 


(yt)? (1 — {1 — expt} [Fatima et al. (2018)] 
a 
ty t-y EG-Gumbel (4,/,7,9) 
ae q \ ex) exp f 6 )} | [Cordeiro et al. (2013)] 
-In{l-(1 + Ay? d-{1-f1-d+my]}" EG-IKum (a,f,7,@) 


2.1 Estimation for exponentiated generalized general class of inverted distributions based on 
dual generalized order statistics 


This subsection develops estimation of the parameters, rf and hrf from EGGC of inverted distributions based on 
dgos using maximum likelihood (ML) and Bayesian approaches, under squared error and linear-exponential 
(LINEX) loss functions. Also, confidence intervals (CIs) and credible intervals for the parameters, rf and hrf 
are obtained. 
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2.1.1 Maximum likelihood estimation for exponentiated generalized general class of inverted distributions 


Suppose that 7) me Znmky Tinnme Agos n from the EGGC of an inverted distribution, the likelihood 
function can be obtained by substituting (8) and (9) in (5) as given below 


L(a, Bt) « o-pren -S a0) |[ Jeon —exp[-A(t,) 7 (13) 


n-1 = 
x] ] da-epl-2@ 37rd -expl-A@, 
The natural logarithm of the likelihood function is given by, 


€= InL(a,B;t) « nina + n Inf —Y_)X(t))) 


Late (a nye expl-A(¢,)}} +[Am-+1)]-1] In. {1 expl-2(4,)1}") + Bk — Din ~ {1 exp{-20, 
: (14) 


Considering that the two parameters a and / are unknown and differentiating the log likelihood function 
in (14) with respect to a and f, one obtains, 


aa ze (m+) In(—{1—exp[-A(t, 1} ) + In( - {1-exp[-A(t, )]}), (15) 


and 
[-A(¢, )]}* intl — exp[-A(, )]} 
(1— {1-exp[-A(G)}}*) 


ae 2  f- 
faa Dalit eanl-AG, I +80") D> ae 


{L-expl-A(¢, 3“ )in{l—exp[-ACe, JI a 
(Bk -1) = : 
(1—{1-exp[-A(z, 3") 
Equating (15) to zero, one can obtain the ML estimator of /, 
j — (17) 


(m+ 1) 9." ‘In {1-expl-A(e,)1}") +k In(d (1 —expt-A0G, 18) 


Then the ML estimator of the parameter a can be obtained numerically by substituting (17) in (16). 
The invariance property of the ML estimation can be used to obtain the ML estimates R(t) and h(t), by just 
replacing the parameters by their corresponding ML estimates, as given below: 


RQ =1-(- f{l-exp[-AM]@B, > 0, (18) 


and 
GB e(thexpl-A() fl — expl-A()]}* "(1 1 - exp[- Aon 50. 
1-(-1—exp[-A@p*)*" 


Asymptotic variance—covariance matrix of maximum likelihood estimators 


hit) = 


(19) 


The asymptotic variance-covariance matrix of the EGGC of inverted distributions for the two parameters 
aand / is the inverse of the Fisher information matrix as follows: 


ot oe 
71, | ar@ cowa,B)|_ 1] eB dadp 
cv(a, B) var(B) 


“|| ae ae 
0aop da? 


ap 
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with, 
ore Jt (20) 
op’ Bp 
ae -n A {l-expl-A(t, I} (nfl — exp[-A(t, 1} 
ayaa 1-1 ! 
re ara ad py ieee is 
(Bk -1) {l—exp[-A(t, )]}* (nl — exp LAG, yy’ 
(1— {1—exp[-A(t, )]}") 
and 
ae \ fl —exp[-A(t,)]} Inf — exp[-ACs, )I} 
=(m+1) % 
Opoa — (l—{l-exp[-A) 1") (22) 


_flcevl-AG, intl exp [AE 
(1-{1-exp[-A,, 3") 
The asymptotic normality of the ML estimation can be used to compute the asymptotic 100 (1 — w)% 
Cls for a and f as, 


At Zq0/7H@ and P+ Zy 0 [=e. (23) 
Also, the asymptotic 100(1 — w)% CIs for are given by, 
R(t) + Z -2/ var (R(O)) and A(t) + Za-%) [= (te): (24) 


where Z is standard normal and (1 — @) is the confidence coefficient. 
i 


o 


2.1.2 Bayesian estimation for exponentiated generalized general class of inverted distributions 


Bayesian estimation of the parameters, rf and hrf based on dgos under the and LINEX loss functions is 
considered. Also, the credible intervals of the parameters are obtained. 

Assuming that the parameters a and £ of EGGC distributions are random variables with gamma prior 
distributions, then the joint prior density function of a and f is given by, 


m(a,B) x a? Bol) el ean, (25) 


where ¢,, C5, d;, d, are hyper parameters. 
The joint posterior of a and f can be derived by using (11) and (25) as follows: 


m(a,f|t) x L(a,f\t) (af) 
= qrtcr—1 pnt col ga (di +Yia{1-exp[-A(t]}) e~Blda—(m+ 1 Yiky In Qi-k In Qn] II. (O01 
i=1 
where, 
Q;= (1 - {1-exp[-A@}B), = 7 = 1,2,...,0. (26) 


Hence, the joint posterior distribution of a and £ is, 


ma, f\t) = 


qn+c1-1gn+c2—19—A(d1+5jL 4 {1-exp[-A(t))]}) -—Ald2—(m+3) Spey In Q;-kIn Qn] Ti (@)72 
° 


(27) 


gil(n+c2) 
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where, 


ro) ahtc1-1¢— (di t+Yjaq{t-exp[-A(t{)]}) 14(@i)2 


= f= da. 28 
91 = fo [4.—(m+1) YP In Qi-k IN Qn]? : e) 


a. Point estimation 
The Bayes estimators of the parameters, rf, hrf and rhrf based on dgos under SE and LINEX loss functions of 
the EGGC of inverted distributions are obtained. 

I. Bayesian estimation for exponentiated generalized general class of inverted distributions under 

squared error loss function 

Under SE loss function the Bayes estimators of the parameters a and f are given by their marginal 
posterior expectations using (27) as given below, 
z oo aM ter e~4(41 +E} a(1-exP[-A(E)) Fy (Q,)72 
Gse) = E(a|t) = f, Ee ee da, (29) 
and 
(oo) (n+ Egat trl gO dit Zie ft -expl-ACE NN) Tete 


da. (30) 
Q1 [do — (mn + 1) YEP ING, — king, J 


Bsey = E(B|¢) = 
0 
The Bayes estimators of the rf and hrf under the SE loss function are as follows: 
atte1-19—%(41 +i {1-exp[-A(t))]}) 1.4(@)72 


a da 1 
91 [dg—(m+1) YE In Qi—K In Qn—InQ]" ey) 


Rise(t) = E(R(O|t) =1- f- 
and 


hisey(t) = E(h(t)|t) = 


oo bce at*c1-1gntc2-1 g(tyexp[—A(t)|{1-exp[—-A(t)]} 2-1 QFE (441 Dial exP[ Ata) ]}) 9 Blda— (m4) rit In Q;-k In Qn] W4.(@07 


fo Jo 101-QB)T (n+p) da dp. 
(32) 
where, 
O=(1-{1 -exp[A@]}9, (33) 
and 


Q, and Q,, are given by (26). 
II. Bayesian estimation for exponentiated generalized general class of inverted distributions under 
linear exponential loss function 
Under the LINEX loss function, the Bayes estimators for the shape parameters and are given, respectively, 
by, 
=f! -1 00 qnter-1p-a(v+dityt 4 {1-exp[-A(e;)]}) TI (Q;)71 
a =—InE(e-’*|t) =—In > daa, 34 

any =o ECE) = It fo asm SIMO kin Qa) 

and 


~ aol _vB Prat! oo qttei-1)e-@(d1 t+ E}L  {1-exp[-a(€,)]}) W1(@i)7* F 
Bux) 5 ln E(e |t) 7 In So PF (v+a,—(mt1) yet noel On)? a, (35) 
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Also, the Bayes estimator for rf based on dgos is, 


Rianxy(t) = =n E (eR) |¢) 


-1 00.00 yntcy Lgntes 1,-v(1 QPF o a(d,+D 4 {1-exp[-A(t]}) 
In ie) i ei ee a ee 
pil + cp) (36) 
_ n 
x e—Pld2-(m+) Dy In Qi-k In Qy] le eo" da as ; 
i= 
and the Bayes estimator for the hrf based on dgos is given by, 


hacinx) (t) = In Ee we |) 


»|sbaerl- AQ IG- ene Acey 1 (Q18— + 


ghter-1gntes-1¢ 1-[9)F 
= = Ce ae es 
: i [* pil + cp) (37) 


n 
elds +d {1-exp[-A(ti) DB] e-B ide eee || (Q;)~1 da dB |, 
i=1 


where, 
Q; and Q,, are defined in (26), g, is given in (28) and Q is defined in (33). 
b. Credible intervals 
In general, (L(x), U(x)) is the 100(1 — w)% credibility interval of @ if, 
iy (4) = 
P[L(t) <O0< U(t)|t] = Si¢t) m(0|t) d@=1-a. (38) 
where, L(x) and U(x) are the Jower limit (LL) and upper limit (UL) and @ is the level of significance. 
Since, the posterior distribution is given by (27), then a 100(1 — w)% credible interval for a is (L(4), U(£)), 
where, 


co gt tei-19-a(di tL, {1-exp[- ACD)  (@i)7? 


Pla > L(t)|t] = a: da=1-", (39) 


Su) i[d.—(n+1) P21 In Qi-k In Qn] 
and 
oo n+c4-1 -a(dy+yFL 4 (1-exp[-A(t{)]}) Wt (Qi)7* i 
P U(t)|t ne dats 

[a > (¢)|¢] = Ju(e) g,[d 2-(m+1) yin Qj-kIn Qn] +c2 a 2 (40) 


Also, 100(1 — w)% credible interval for 6 is(L(), U(D), 
where, 


PLB > L(O 

© gntcr-1pntcr—1e ~a(dy +E Ei {1-exp[-A(ti)}) e—Bld2—(m+1) Yizy In Qi-k In Qn] Wye (Qi) 
Sat gil (n + C2) 

=1-5, (41) 


dadB 
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and 
0 00 atter-lgntc2—1p—A(dy +3 jn4{1-exp[-A(Ci)]}) e-Blda—(m+) rien Qj-k In Qn] W.(@)7? 
PIB 4 U(e)|E] 7 Ju(t) So uot. dadB 
a (42) 
2! 
where, 


Q; and Q,, are defined in (26), g, is given in (28) and Q is defined in (33). 
All the previous equations can be solved numerically. 


2.2 Prediction for exponentiated generalized general class of inverted distributions based on 
dual generalized order statistics 


This subsection considers ML and Bayesian prediction (point and interval) for a future observation of the 
EGGC of inverted distributions based on dgos, under and LINEX loss functions. 

Let T(1,n,m,k),..., T(4n,m,k) be a dgos of size n with the pdf f(t;a,f) and suppose Y(1,n,, m,, k,),..-, 
Y(r,,N,,m,,k,), k, > 0, m, ER is a second independent dgos of size n,, of future observations from the same 
distribution. Using (6)- (9), the pdf of the dgos Y(,) can be obtained just replacing t,) by y(,) as follows: 


F(Yy|% 8) = * Tie Ayn )lexP[-A()) {1 - exP[-AG 


- (43) 
a\PYs— 1 
x (1— {1 — exp[-A()]}") a, [F(¥)), 
where, (51 =[]j,=17j,» 9u(¥s) = hu (Ys) — hu (1), YF = ky + (ny —7)(mMy +1). 
o B my+ s-1 
aft (temo orp meat, 
Ging’ (F(¥@)) = (44) 


[-m(1-(1- eae) ] | omy act. 


For the future sample of size n,, let Y/,. denote the s“ ordered lifetime, 1 <s < ny, hence the pdf of the dgos 
Y(,) from EGGC of inverted distributions is obtained by substituting (44) in (43). 


Case one: for my F# -1 


Fo |@8) = 


(my +1)" *( 


(1-[L-exl-age)") [1-0-1 el -A00)D") 


Using the binomial expansion, one obtains, 


HAO olerAOo)l- AG) 


aa 


ap C51 = th 
So? 7 s— OL —U(Vs 1- —X(V¢5 
FO el@B) = Fp Moller oll orl AG) 
s-1 
i, (S71 -(a@41) a B(¥stix(my+1))—1 
«Doe, )G +96) PA-f-ewt-aGo))) | 
Let, 
= Ss—1 Eth sti = 
f= Gopher Mm = CDE (T!) and a, = [ye + Gry +0) 
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Then, 
f(%@|@B) = FaB[-A(~))]exv[-A(w) {1 - exP[-AVV) |} 
s-1 
ay P@i,-1 (45) 
x » Ni (1 — {1 — exp[-A(s))]} ) Vs) 70, ap >. 
1,=0 


Case two: for m, =— 1 
ky ape . a-1 
FY @l%F) = Goal Ce) PAG wt — ex7[-40@)]} 


x(1-{1- exp[-A(e))}") [-In( — (1 — exp[-a(y oD (46) 
Ys) > 9, a, B > 0. 


2.2.1 Maximum likelihood prediction for exponentiated generalized general class of inverted distributions 


Two-sample ML point and interval predictors are obtained based on dgos. The ML prediction can be derived 
by using the conditional density function of the s“ future dgos which are given by (45) and (46) after replacing 
the parameters (a,f) by their ML estimators (4,f). 

a. Point prediction 
The ML predictors (MLP) of the future dgos Y(,) can be obtained using (45) and (46) as follows: 


Case one: for M#-1 


96) = fo ¥oy@Bl-AYes)JerP[-AY eo] — P[-AYH)]} (47) 
w a 47 
av Boi,-1 
x y Ni (1 — {1 — exp|-A(5))]} ) dys. 


14=0 


Case two: for M=-1 


Ie) = fe Yo) eal AV) leP[-AV ew) — exP[-AYo)]} 
(s-1) 
(48) 
a Bky-1 @v\7571 
x(1-{1-exp[-20)) [FM -ft-ex[AOw)P)] - 
b. Interval prediction 


In general, the ML predictive bounds (MLPB) for the future dgos ¥/,), 1 <s < N, can be derived as follows: 
U(t 


PILE) <%o <YOWI= | fool Bide =1-u 


Case one: for M4#-1 


PM) > LIE] 


: hae? [AQ wexr[AO@)I1- e[-AO)}" (49) 


s-1 Ba: — 
x » Ni, (1 ={1= exp[-AOe))}) 
f=0 


1 w 
dys) =1—5, 
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and 


Pik > YOM) = PAC oleP-AG oA x7[-AGo) 


(50) 
a Bai,-1 w 
x Dion, (1- {1 -exp[-AQw)]}) dw) =F 
Case two: for M=-1 
PIMs) > LIE] 
[oe) ks aps 
= [Gap olerl-AG elt erl-AGo)) 
av Bky-1 a. s-1 e} 
x (1 —{1- exp[-@s))]} ) [= In (1 —{1- exp[-An))]})| dy, 
— 1 o 
=1-9, 
P[M%s) > U(e)IE] 
apo. a- 
=[ Mow )lewl-Aveo)It- x7[-A20)) 
u(x) (S — 1)! 
7 4 (52) 
a BRy-1 @\ 571 
x (1 — {1 — exp[-¢))]} ) & In (1 — {1 - exp[-AQ5)]} | dy, 
= wW 
=5, 


2.1.2 Bayesian prediction for exponentiated generalized general class of inverted distributions 


The Bayesian two-sample prediction is considered based on dgos for the future observation Y(., 1 <s <n, 
The Bayesian predictive density (BPD) function can be derived as follows: 


The BPD function can be derived as follows: 


F(y@lt)=So fo f |B) x(a, Blt) dadg, (53) 
where, z(a,h | t) is the joint posterior distribution of a,f and f(y(s) | a,f) is the pdf of Y(,). 
Case one: for my # -1 


The BPD of ¥,) given is obtained by substituting (27) and (45) in (53) as given below 


E(n+ ca)ant[— A(y(sy) exp[-A(sy)] 
f(% t+ea+ 
(IE) = he i(1 - {1 — exp[—A(s))]} *) TT oe ea (54) 


x : — exp[-A(yi) [Je ttn exPL-AOD) dar, 


where, t3 = [(dz — (m+ 1) H7y'InQ; — kInQ,) — YF wi, In [n.(2 cot is exp[-A(y¢) |} J] 
a a = (1) (5,7), wi, = [Fs + amy + 1], Q,and Q, are defined in (26) 


= (my+1)°(s-1)!’ Ta 
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Case two: for my # -1 
Substituting (27) and (46) in (53), the BPD of ¥,) given ¢ can be obtained as follows: 


FYI) 
= ie sa? [-A(yen Jexr[ Ao {1 = ex7[-A eo)" T+ @ +5) s 
0 ,(s—1)! (1 ce he exp[-A(¢s))]}") TE QP + cz) aS (>) 


x [-In(1- {1 exr[-AO)})] oe 8(tat Ei (t-exPl-ACHDD) dar, 


where, 

ty = [Ca —(m+1) YE} InQ — kInQ,) —ky In(1 ar oo exp[-A(s))}} QO, and Q,, are defined in (26). 
a. Point prediction 

The Bayes predictors (BP) of the future dgos Y(,) can be derived under SE and LINEX loss functions as given 

below: 


Case one: for my, # -1 
The BP of the future dgos Y/,,) can be obtained under the SE loss function using (54) as follows: 
Yisrysey = E(Msylt) = fo Yosy fa(Msylt) €V(su > 


= i ; I Yoon (n+ C2) [A on)] ex [-A(Y oxy {1 = €xP[-A(Yeon JJ} SEE exP- AOD) 
0 0 


dady 
a (sl) 
ai(1— {1 — exp[-AQox))]}") M1 Ge S 
The BP of the future dgos Y,.,) can be obtained under LINEX loss function using (54) as given below: 
g (sl) g 
-—1 be 
¥(s1) (NX) = —InE(e vest), 
E(e Men |t) = fy eM fa(Yeox)|t) (61) 67) 
— c2)a"*11 Aeon )JexP[-A(yeon) {1 — 2x? [-AY en ))} eH AeD) 
= a dys): 
0 Yo g(t — {1 — exp[-AQey)]}) Me @ te ° 
Case two: for m, =—1 
The BP of dgos ¥(,9) can be obtained under SE loss function using (55) as follows: 
Y(s2)(SE) =E (2) t) = 
a-1 a. 482-1 n 
co pea Yeayhy Ot | Ay (29) ex] -A(y¢o0y) {1-22 | -A( 0 ~In(1-{1-exp]-A0(q9y) o(41+Efa1 er ADD) r(ntcy+52) a 
rue 91(82-1)'(1-f1-ex7[-20)]}) THe QP @+e2) aa ae a Y (52) é (58) 


The BP of the future dgos can be obtained under LINEX loss function using (55) as follows: 


-1 
Y(s2)(LNX) = ee In E(e-*¥e)|t), (59) 
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where, 
E(e ~VY(s2) |t) = 


[* [Ratatat Aen AG Ife Aen 
91 (s2 HI(1 {1 exp| AQ(s2))]} oie 1 QT (n+ co) qos 


x [- In(1 -{1- exp[-AQe2))}")] 7 e714 +Dinat-exPl-A] dard y (sy . 


b. Interval prediction 
The Bayes predictive bounds (BPB) of the future dogs Y(s) can be obtained using (54) and (55) as given 
below: 


Case one: for m,# -1 
Pon > LO)IE] 


-[ i. §(n + coya"*4[-A (yoo J exP[-A(s1))] 
u()Jo gy(1 — {1 —exp[-AGsn)]}) a @ THY (60) 


7 ‘ i 
x {1 — exp[-A(yo1)]}" 1 oa (dy +E Pa(1-expl-AtOD) day dys1 = 1-5, 


and 


P[%s1) > UIE] 


= ie E(n + cg)ar[— A(y(s1)) ]exp[- A(y(s1)] ” 
u(t) 4o o1( (4 {1 — exp[-A(s1)) ]} “yi Qt potent) (61) 


x {1- exp[—A(yis1y)]} eA Es exPL-AOD) da dy(s1) = ; 


Case two: for my, = -l1 


PLY (62) > LOMA = ae i 
oo kaa [-A(y¢52y)]expl-A(y¢say)If1-exP[-A(Yoap)]}° - [-In(1-{1-ex[-AO oa) )}")[_ eal PAOD ren ten +2) og 
Sie fo 1(s2-1)!(1-{1-exp[-AQ(62y)]}) Mea aE nen) ty? (82) 
a pee (62) 
3 
oo kan ter) — A(y(s2))Jexp[- A(Y(s2))|{1-exp[- My¢s2))]}" TE In(4 —{1-exp[- aoa")! ~a(dy +07 (2-exP[-A(E)) perg.cyts2) a 
ie Jo (s2-1)!(1-{1-exp[-A(v 52))]}") Tea GMa) ty) 782) 
=2, (63) 


All the previous equations can be solved numerically. 


3. The exponentiated generalized inverted Kumaraswamy distribution 


This section is devoted to introducing the EG inverted Kumaraswamy (EG-IKum) distribution as a new 
distribution which is a special model from EGGC of inverted distributions. Some of its properties are studied 
through, rf, hrf and rhrf, graphical description, moments, and related measures, mean residual life and mean 
past lifetime, the ML estimators, CIs for the parameters, rf and hrf of the EG-I[Kum distribution based on 
dgos are obtained. Also, the shape parameters, rf and hrf of the EG-IKum distribution are estimated using 
the Bayesian approach. The Bayes estimators are derived under the SE and the LINEX loss functions based 
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on dgos. Credible intervals for the parameters, rf and hrf are derived. The Bayesian prediction (point and 
interval) for a future observation of the EG-IKum distribution are obtained based on dgos. All results are 
specialized to lower record values and a numerical study is presented. Moreover, three real data sets are used 
to illustrate the flexibility of the distribution. 

Assuming that 7 is a random variable distributed as EG-IKum distribution which is a special model 
from EGGC of inverted distributions in (8) when A(t) = [-/n(1 — (1 + 1)”)*] with shape parameters, 
A= (0), 95, 63, 84)'> 0, denoted by 7 ~ EG-IKum (9), the cdf and pdf are given by, 


87, 91 
Frei (ts 8) = (2 _ {1 —[1-(+ ty} ) ,t>0,0>0, j=1,2,34, (64) 


and 


4 
frei (ti 8) = | [4 (1+ 0) — (1+ tyes] {1 a|t=(4 ey 1 
jJ=1 


(65) 
3,02 9172 
«(1-{1-[1- a +9] | ) , t>0,9>0. 
i. Reliability function, hazard rate and reversed hazard rate functions 
The rf, hrf and rhrf of EG-IKum are given, respectively, by, 
On, 91 
Reare(t;9) = 1-(1-{1-[1-a+o-%]"} ) t>0,@>0, (66) 
04-1 
ae et | 9.040272 
hea (68) = “Ta ap = Fay (1-(1-a+9%)") 
(67) 
4,0 0-2 
x(1-{1-f1-atoey"} )  t>0,8>0, 
and 
7 04-1 a Gay 2 
rhecix(t; 8) = | [4 GQta-Gtop ate)” {1-[1-a+ey-%]"} 
jJ=1 
(68) 


x (2 - {1 =(t=(+ oeey) t>0,@>0. 


Plots of the pdf, hrf and rhrf of EG-IKum are given, respectively in Figures 1-3. The plots, in Figure 1 
and Figure 2 show the behavior of the pdf and hrf which are positively skewed and uni-modal with different 
values of the shape parameters. Also, the behavior of the hrf indicates that the model possesses the non- 
monotone property, which is useful for application in reliability studies, clinical trial studies and analyzing 
different data sets. From Figure 3 one can observe that the curves of the rhrf for all values are decreasing and 
then constant. 

ii. Quantiles 
The quantile function of the EG-IKum is given by 


1 


Caer tgs 
ic ft-fi-a-aoe =f}, Caged, (69) 


Special cases can be obtained using (69) such as the second quartile (median), when q = 0.5, first quartile 
when q = 0.25 and third quartile when q = 0.75. 
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A =0.5 .Q=23 .@B=2.4=18 


= Ses A =2.9 .Q=57 .B=3.1 .A=54 


fx) —=—— A =6.5 .Q=79 .GB=14 .A=s 


seen ia Ol =2.7 .@ =5,63=19 .4 =1.2 


Figure 1: Plots of the pdf of the EG-I[Kum distribution. 


— 41 =3.3 .€2=0.5 .63 =2.04=0.8 
Sse = 61 =5.7 .62 =2,03 =3.1 .A=94 


nx) S—s 01=2.3 .62 =3,93=25 ,94=2 


@€1=27 .62=0.8 .63 =0.5 .4=0.2 


Figure 2: Plots of the hrf of the EG-[Kum distribution. 


@=255 ,B=33 A=13 jw=1.1 


S2ee @=46 ,B=11 A=088 ,u=6 
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@=16 .B=058 .A=1,u=179 


Figure 3: Plots of the rhrf of the EG-[Kum distribution. 
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iii. Moments and related measures 


a. The central and non-central moments 
The r* non central moment of the EG-I[Kum (9) distribution is given by, 


4 
iy = || a) 0 B((r+1),[3(k3+1)—-r]}, r=1,2,..,  03(kg+1)>r, 
ji SG 


where, 


ie = y y . ie 1ypeinstns PO 5a he - ‘ ae “ i) 


K 1 =0 Kp=0 K3=0 
and B (.,.) is the beta Type II function. 
Thus, the mean and variance of EG-IKum distribution are, 


gq, 18s (K3+1)—-1] 


fy = E(X) = T1ja1 9) Deer @ BE2, [ O3(K3 + 1) — 1} = [fer 9 De D T[@,(¢g#1) 447° 
; 20 [65 (%34+1)—2] 
fig = E(X*) = Ilj=1 9; Ler OD BL3, [O3 (x3 + 1) — 2]} = j= 95 ier TORE AGEL 


and, 
V(X) = a - (A)? 
21 [03(k3 + 1) — 2] [O3(K3 +1) -1] 
-TT. ts is T[@3(k3 +1) +1] -(IT. 1 8 rests t Do) , 
The coefficient of variation is given by 
_W@® 
—_E(X) 
20[03(K3 +1) — 2] l[3(K3 +1) — 1])’ 
| (Mian 0 Ty aay (Mae ee Ey eay ) 
P[@3(k3 + 1) — 1] , 
na 8 oe? Pas (e5#1) #1] 


where >)... is given in (71) 
iv. Mean residual life and mean past lifetime 
The mean residual life of EG-[Kum : distribution can be obtained as given below: 


— (to — Dae tr {[(1 + to) BP] — A). 
ReerK(to; ®) 


SY Seon (yO 


and Rrgzx(to38) is given by (66). 
The mean past lifetime of the component can be obtained as follows: 


De tr {[C1 + to) C38-Y] — 1} 
Feerk (tos 8) 
where, >")+ tx is defined in (77) and F'rgjx(to3@) is given by (64). 


m4(to) = 


where, 


m2(to) = 


(70) 


(71) 


(72) 


(73) 


(74) 


(75) 


(76) 


(77) 


(78) 
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3.1 Estimation for exponentiated generalized inverted Kumaraswamy distribution based on 
dual generalized order statistics 


This subsection proposed the ML estimators, Cls for the parameters, rf and hrf of EG-[Kum distribution based 
on dgos. Also, the shape parameters, rf and hrf of the EG-IKum distribution are estimated using Bayesian 
method. The Bayes estimators are derived under SE and LINEX loss functions based on dgos. Credible 
intervals for the parameters, rf and hrf are derived. 


3.1.1 Maximum likelihood estimation 
The ML method is used to estimate the parameters, rf and hrf of the EG-I[Kum distribution based on dgos. 
Also, the asymptotic 100(1 — w)% CIs for @ are obtained. 

Suppose that 7) me» Tenm ry Tinnm,e are n dgos from the EG-[Kum distribution, then the likelihood 
function can be derived by substituting (64) and (65) in (5) as follows: 


4 n 


= 62-1 
Leer (8 t) lord | |e | [a + 6) tO —(1+ ti) ]"* 1 {1 _ [1 + ty} 2 
j=l t=1 
HT 64 > 64(m+1)-1 
x [][(a-f1-f-ate8]"} ) (79) 
i=1 


6. Ok=1 
x (1 —{1- [1-0 + t)-]""} ) 


The likelihood function can be rewritten as, 


Lecix (83 t) & Tiz1 67 (Q7) * Tj=1 6; Salem) Nice Im(Qi)+kin(Qn)], (80) 
where, 
=(1+8)- Of — 44-8] {1-[1-a@+ aayyr (81) 
OF = (1 -{1-[1-a+ wy"), i=12,..n. (82) 
The natural logarithm of the likelihood function is given by, P 
€, =InLggx (Ot) Xnln 6, + nln d, + nlné3 + nln 0, — (83 + 1): In(1 + t;) 
t=1 


04-DY" 1=40)-*]+@.-0Y In {1- [1-a +e] *} (83) 


n-1 


+[0.(m+1)-1] > In(Q) + (01k — 1) In(Q3). 
t=1 


Considering the parameters 6;, j = 1,2,3,4 are unknown and differentiating the log likelihood function in 
(83) with respect to 6; one gets, 
ae n-1 
n 
—=—+(m+1) y In(Q7) + k In(Qx) , (84) 
00, O4 Li 
l= 
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oe eee} 
t=1 
a “{1—-[ [-ateyeyy In fa—[1- +4 )-%]"") 


~(6,(m+1)-1) 2, (85) 


Qi 
(,k-1{1-[1-(+ Pcie inf—[1- (1+ tn) ]"*} 
. Qh ' 
(1 + t;)~% nq + ti) 
zm +i) +Gr- py erence TETCEEE 


O4[1—- (1+ 7 le (1+ t;)- 8 n+ t;) 


eer ») (1-[1- G+ ey) 


=1 


+ [0,(m+ 1) - 1] 


n-1p 9 {1 - - 05194) py 6 ) 
294 [1-1 + ,)-] } [1-4 +t)7 ae "(1 + t)-% Ind + t) 
Qi 


x 


t=1 


-9,94)92+ -93]9471 -9 
0204(0,k — 1) {1 —[1-(1 + ty] } [1— (1 + ty)" ] * (1 + ty) In(1 + th) 
+ Hav 
Qn (86) 


and 


a - ry aN ft G+ ti] Inf — 0 + 37] 
-E+y op (1+ t))7*] — (62 Dy) (1—[1—-( +t) ]94} 


+[0,(m+1)-1] 
62 {1-[1- (+ sayy [1-0 +e)-%]" Inf — 1 +t)" ] 
x ee ee 


ft (Q— 1 - [1-4 + 6-8 ]%9%) 


-93404)921 92194 4 
62 (0,k-1)}1-[1-(1+ tp) 93] [1-(1+tn) 93] * Inf1-(14tn) 3] 


(87) 
94194) °2 : 
(--f-attn Rel ) 

where Q; and QO; are given in (82). 
Equating (84) to zero, then solving it numerically, one can obtain the ML estimator of 6, 

~ —n 

Cie (m+1) ym In(Q; )+k n(n)’ (88) 
where, 

8, 


6; =(1- {1 a ag One oe] | ; S12 aon. (89) 
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The system of non-linear Equations (84)- (87) can be solved numerically using Newton-Raphson method, 
to obtain the ML estimates 6,, 0,, 03 and 6,. 

The invariance property can be used to obtain the ML estimates Regix (t) and hgg x (0), just replacing the 
parameters @,, j = 1,2,3,4 by their corresponding ML estimates. 


Asymptotic variance-covariance matrix of the maximum likelihood estimators 


For large sample sizes, the ML estimators under appropriate regularity conditions are consistent and 
asymptotically unbiased as well as asymptotically normally distributed. Therefore, the asymptotic normality 
of ML estimation can be used to compute the two sided approximate 100 (1 — w)% Cls for @ as follows: 


Le, = 6; - 2 (1-2)-[*%@) and Ug, = 6; + Za _s) [ar .s (90) 
Also, the asymptotic 100 (1 — w)% confidence intervals for rf and hrf are, 
Reo (t) + Za yy var(Reci(t)) and  hggix(t) + Za _oy | var (Reci(t)) (91) 
where Z 2 is the standard normal percentile and (1 — @) is the confidence coefficient. 


2 
3.1.2 Bayesian estimation 


The parameters, rf and hrf are estimated based on dgos using the Bayesian method, under and LINEX loss 
functions. Also, the credible intervals are obtained. 


a. Point estimation 
Let 0),02, 8; and @, be independent random variables with a gamma prior distribution. Hence, a joint prior 
density function of 0 = (6,,0), 93, 9)’ is given by, 
(0) & []ja1 °F *e-49F .6;,d;, cj > 0, j = 1,2,3,4, (92) 


where c,, d; are the hyper parameters. 
The joint posterior density can be derived by using (80) and (92) as follows: 


Tick (Olt) 
4 n 

=VY [| ge ee | oF (Q7) wt gtstn—4 4 O,[d—(mt) EE In(Q; )-kIn(Qn)| ; (93) 
j t=1 


where ¥ is the normalizing constant and 
co co co 4 i 
w-1 = IG + » | I | git} 6-49; | [e (Q7) -1 
0 Jo Jo \4_3 q 4} 


1 
C1 4+n 


x ———— 
fe (m+1)y"5 ' In(Q; ) eae kIn(Qn)| 


(oe) (oe) (oe) 4 n 
i" cjtn-1 _g.a9. x ey 
a PEE (emf ea 
DO Nee i=1 


i 
————_———_—— er (ae 


[4, — (m+ HYPE n(Q7) — kin(Q@z) |r” 


Let, 


(94) 
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Hence, the joint posterior distribution of @ given ¢ can be written as follows: 
TEGIK (9 | t) 


1 
~ oT (G +n) 


4 n 


ae ae he (O37) 196144 g-O1[4s Cnt) LET In(Q7)—KinQa)] |, 


(95) 
where Q; and Q,; are given in (82). 


I. Bayesian estimation of exponentiated generalized inverted Kumaraswamy distribution under 
squared error loss function based on dual generalized order statistics 
Under the SE loss function, the Bayes estimators of the parameters @ are given by their marginal posterior 
expectations using (93) as shown below: 


6j(seyeGIK = E(@j|t) = ie 6; Tecix(@lt)d@, j=12,3,4, (96) 
where, 


Jo =So So So Jo © nd 40 = d0,d0,d0,d6, 


The Bayes estimators of the rf and hrf under the SE loss function can be obtained using (66), (67) and 
(95) as follows: 


Riseyeore(t) = E(Recie (t)|t) = 1 - I, (Q*)% miare(Glt) 40 
: (97) 
<7 dO, dO3d04, 


+n-1 _q.g. 
Fee 895 Te F(Q7) 7} 


93 [41—(m+1) YE} In(Q;)-kin(Qy)-In(Q*)| 


Ij=2 8; 


Sly lo ie. 
and 
heseyecix(t) = E(hegeix(t)|t) 


4 


i ie pa aes Te signe 1-0; [di—(m+1) DE} In(Q7)-Kkin(Qn)| do, 
= GE Gees [9 (@i) 


(98) 
where, O* = [1-(1 -(. —-(1 + 1)°)-4)-®], OF and Q; are given in (82) and g} is defined in (94). 
To obtain the Bayes estimates of the parameters, rf and hrf, (96) - (98) should be solved numerically. 


II. Bayesian estimation of exponentiated generalized inverted Kumaraswamy distribution under 
LINEX loss function based on dual generalized order statistics 
Under the LINEX loss function, the Bayes estimators for the shape parameters @ are given, respectively. by 


6(Lwx)EGIK = In E(e Svea ey 


4 n 
Pate i e7 V9; —____ i | [ar ee | a (Q}) —1 gtr tnt 9-6; [ds— Gnd) Fer tn(i)—kin(2)] do\, 
v lo gil (cy +n) 1} J 14 re 


j=12,3A, (99) 
where 6; is given in (81), QO; and Q,' are given in (82) and 7 is defined in (94). 
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Also, the Bayes estimator for rf and hrf based on dgos can be obtained as follows: 


2 a : 
Ranyyecin(t) = —InE(e Fea) (t) é 
~ é | (100) 


-1 i 
Set eee aaa 
and 
Aawxyeork(t) = = In E(e. een «) |) 
ae ee (101) 
= Fln[fg eo Veen On(ae) da], 
To obtain the Bayes estimates of the parameters, rf and hrf, Equations (99)—-(101) should be solved numerically. 


b. Credible interval 
Since, the posterior distribution is given - (95), then a 100(1 — w) % credible interval for is 


(L(t), U(e)), where P[L(t) < @ < U(t)|t] = eg. Tie (8\t) 49. = 1— 
Then a 100(1 — @)% credibility interval for 6; based on dgos is (L; a (t)) where, 


ae w 
P[o; > L;(t)|t] = J cy Tiowe 1 48 =1-F, 0 FHL 24, (102) 
7ACs 


and 


foe} 


(0) 
PEG > WCDI = |) micu (Olt) d= 7, = 12 A (103) 
J 


3.2 Prediction for exponentiated generalized inverted Kumaraswamy distribution based on 
dual generalized order statistics 


In this subsection, the ML and Bayesian prediction (point and interval) for a future observation of the EG- 
IKum of distributions based on dgos, are considered under and LINEX loss functions. 

Let T(1,n,m,k),...,T(r,n,m,k) be a dgos of size n with the pdf; f(t @), and suppose 
Y(1,n,,m,,k,),....V(r,,2,,m,,k,), k, > 0, m, € Ris a second independent dgos of size n,, of future observations 
from the sae distribution. Using (6), (7), (64) and (65), the pdf of the dgos Y/,) can be obtained by replacing 
ty by Wis) as follows: 


Gs— (03 -63 al! 
feo 9190) = HAF + yg [1-14 6) "| 
(104) 
-63 64 S221 7 045-1 ae 
xf-[1-(tye)"] } Ge) sO) 
where, 
64592 
9 = | P J (105) 
-1 = [1j,=1%j,» us) = hu (Ys) — hu (1), Vs = ky + (my —s)(my +1), foralll<s<ny, 
and 
s-1 
= = st f i= Can ” 4 My day 
gin /F(¥@)| = |"? (106) 


s-1 
4. 


In (55), , My=-1. 
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For the future sample of size n,, let Y/,, denotes the s” ordered lifetime, 1 < s <n,. The pdf of the dgos 
Ms) from EG-IKum distribution is ébtained by substituting (106) in (104). 
Case one: for my, #-1 
~(63+1) —9, 194-1 
SMaG(t+ym) [I-A tye) || 


f EGIK(Y (s) |) = Qs 
(s) 


r) 04 82-1 S—1 * (17) 
. {1 [2 = (1 4 Io) ’ Ot Liyz0 Fi, In( 1 Wis), Vs) > 0; 6, >0, 
where, 


SS Ma = CD) (%,7) and wi, = [re + my + D], 


. (my+1) “(s-1)! 
Case two: for for my, =] 


s-1 -1 


T1121 9; ks @ Cf “4 
fecnn(¥¢9|8) = A — (1 +965) Of - 14+) “] 


f.-f-croy fT" Ge) 


Oyky-1 


s-1 
Fe In (G.»)| (108) 
V(s) >0; 6; > 0, 
where Q¥(,) is given in (105). 


3.2.1 Maximum likelihood prediction for exponentiated generalized inverted Kumaraswamy distribution 


The ML point and interval prediction are obtained considering a two-sample prediction based on dgos. The 
ML prediction can be derived by using the conditional pdf of the s“ future dgos which are given by (107) and 
(116) after replacing the parameters (0;) by their ML estimators (6)). 


a. Point prediction 
The MLP of the future dgos Y,,) can be obtained using (107) and (108) as follows: 


San; = E(s)|8) = (ae Vs f (%|9) dys): (109) 
Case one: for M#-1 


o of ‘ 4. 18e-2 
o oes a —(63+1) -63] * 
IS) =. yo §] [9 (+) E = (be) | 
J=1 


pti (110) 
-63 6,0;-1 
x {1 = [2 —(14+%5)) | ' 2 "| [| dys). 
Case two: for M=-1 
a _ . Ilj- 19; ky 6 zi —(63+1) -6; aa 
syn = i Ys) a ae pr +%) [2 — (14%) | 
ee (111) 


Oyky-1 


- {1 -; E (1+ roy] 4] E In (4) dY(s). 
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b. Interval prediction 
The MLPB for the future dgos Y(.., 1 <.s < N can be derived from the following probabilities 


PKs > LOM = hf Cwl& 4m =1-7- (112) 
and 
PIs) > YQ = lyf olB D4 =F - (113) 


Substituting (107) and (108) in (112) and (113), then the lower and upper bounds are obtained as given 
below: 


Case one: for M#-1 
PLY s) > L(t)Ie] 
= | é “14 “gay [1-(+ a 
a | ij Ys) Vos) 


i (114) 


= _02-15s-1 
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¢=0 
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* | ; (1 + Hs) | (1+ %5)) is) 


-6; oS O,a;-1 w 
x{1-[1-2+36) Yr (4) dys) mee 
¢=0 


Case two: for M=-1 
PINs) > LOE] 


a a S-1 a 
jaa 9; ky 84 —(63+1) = haias 
| ear ke ew) [1- A +¥) 


¢¢) GD! (116) 
8, 84 a Oy ky-1 s-1 w 
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Jes (s— 1)! ( Y )) ( Y )) (117) 


aq . 02-1 
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3.2.2 Bayesian prediction for exponentiated generalized inverted Kumaraswamy distribution 


Considering a two-sample Bayesian prediction based on dgos for the future observation Y/,), 1 <s <n,, the 
BPD function can be derived as follows: 
The BPD function is, 


fecix (Ys) |t) — So feck (%s) |9)tEciK (A|t)de > (118) 
where trgrx (8 | t) is the joint posterior distribution of @ and frgix Os) | @) is the pdf of ,). 


Case one: for m,# -1 


The BPD of y(s) given ¢ is obtained by substituting (95) and (107) in (118), hence, 


6194-1 
newol=| [are aga : 
EGIK\Y(s)|L) = 

o o Jo Jo gi Ths of (Q7) “24g (119) 


62-1 4 
-0. 4°? cjtn _g.g. 
x{1—[1- 2+) | } | Ge e~49j d0,d03d0,, 
7=2 


—(03+1) 


where, 


ts = [dy — (m+ 1) DET n(Q?) — kln(Qh) — Vio wi, In (ni, B,o.) |» Wie is given in (105), 6% is defined 
in (81), QO; and OQ; are given in (82) and g; is defined in (94). 


Case two: for m,=—1 
Substituting (95) and (108) in (118), the BP density of Y(,) given f one can obtain, 


sg itaat 
hamtrot)= [f° ete Mito) [I~ 0+¥e) "| 
EGIK\V sy |£ ie fees yi(s— 1)! O53 115;  (Q7) =1p Mt 9)P (6, +n) 


iter -d; 0 


qa 82-1 ats (120) 
x {1 -[1-(t¥) 7] [-n(@..)] Pea +n+s) d62d05d6,, 
where, 


t= [as — (m +1) y%2 In(Q7) — kin(Q%) — kyln (G.)]- Os is given in (105), 6; is defined in (81), 
Q; and Q, are given in (82) and g; is defined in (04), 


a. Point prediction 
The BP of the future dgos ¥/,) can be derived under SE and LINEX loss functions as follows: 


Case one: for m,F -1 
The BP of the future dgos Y,,) can be obtained under SE loss function using (119) as follows: 
Yecik(s)(se) = E(s)|t) = So Yes) fea (st) 4s) 


kT St *) -1 -(C1tn+1) 
P41 ie 1° (Q;) 5 (121) 


@2-1 
-0 84°? cjtn _ayu.p. 
xf1— [1-2 +76) ‘| } j [9 e~ 4/9 d0.d0,d04dy (6). 
7=2 
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The BP of the future dgos Y/,) can be obtained under LINEX loss function using (119) as given below: 


z -(63+1) -6,194-1 
: me pe Ee WO (cy +n) (1+ Qs) : E — (1+ %)) ‘] 
vienowwn =f | [ana 
EGIK(S)(LNX) 0; Tt F 5; (07) -1 aie (122) 


ey any 
Xft [1-2 +¥%) | ‘} | [a7"e*" dO2d0,d04d ys), 
=2 
where, 


f= [a1 - (m +1) SEY In(Q7) — kin(Qn) — YF Wi, In (ni, Veo) Qy,) iS given in (105), 6; is defined 
in (81), QO; and Q; are given in (82) and q; is defined in (94). 


Case one: for m,=—1 
The BP of dgos ¥/,) can be obtained under SE loss function using (120) as follows: 
Yecik(s)(se) = E(¥s)|t) 


cj+n 


7 © Ys j=2 6; es & In Ge T(cy +n +s) 
3 i I I I P3(5 — 1)!Q5,, Ter OF (Q7) “gO, +0) 


i=1 1 
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~(63+1) —9,794-1 
x (1+ %s)) [1-G+y%) "| 
phar 
x {1 -[1-(149@) "] } d02d03d0,dy¢6) . 


The BP of the future dgos Y, can be obtained under LINEX loss function using (120) as given below: 


Cj ae 0; = os oe 
yz rT: <7 ia a [- { ° Tf 29; fe 818 Ke. A) |- In (2. T(cqy t+n+t+s) 
EGIK(S)(LNX) = —— at oe ic) | 
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-(0 -0,794-1 
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where, T. = [4s —(m+1) DET In(Q7) — kin(Qn) — kyin (SI) Q¥isy is given in (105), 6; is defined 
in (81), QO, and QO, are given in (84) and 9; is defined in (94). 
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b. Interval prediction 


The BPB of the future dogs Y(,) can be obtained using (119) and (120) as follows: 
Case one: for m, #-1 


P[Yeaixs) > L(¢)|] 


-(0 ~6. 04-1 
ee a fi i +nj(1+y%)) E — (149%) ‘| 
L(t)4o Jo Jo 071 5 (Q7) -1 pen) 
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Case two: for m, =—-1 
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64-1 
dee ak 4 fit? 430, peene -63) 4 
-| | | | ky Wj=28, I (1+) [1-+%) "| 
u(t)Jo Jo Jo @3(s — 1)!Q},. M1 Of (Qf) ret” (cy +) 
(127) 


-65 64 62-1 s-1 
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4. Numerical results 


This section aims to illustrate the theoretical results of the Bayes estimates and BP under SE and LINEX 
loss functions. Numerical results are presented for the EG-I[Kum distribution based on lower record values 
through a simulation study and three applications. 


4.1 Simulation study 


A simulation study is introduced to examine the performance of the Bayes estimates and BP for different 
sample sizes of lower record values and for different parameter values for EG-IKum distribution. 


4.1.1 Bayesian estimation 


The lower record values can be obtained as a special case from dgos by setting m =-—1 and k= 1, therefore 
the estimation results obtained in Subsection 3.1 can be specialized to lower records. The Bayes estimates 
of 6, where, j = 1,2,3,4, are evaluated. Also, rf and hrf and their average estimates, estimated risks (ERs) are 
computed based on lower record values according to the following steps: 


a) For given values of 6;,/= 1,2,3,4, random samples of size n are generated from the EG-IKum distribution 
observing that if U is uniform distribution (0,1), then, 
1 1 1 


1 
a 8 \ 8. “0, F A 5 ci 
t, =| d-d-d-(W,,)")°)"*) © -1}, is EG-IKum (@) distribution. 

b) For each sample size n, consider that the first observation is the first lower record value ¢,, then denote it 
by R, and the second observation f,, denote it by R,; (t, > t,) record and ¢, < 4, if ignore it and repeat until 
you get a sample of records Rv. 

c) The Bayes estimates of the parameters, rf and hrf under SE and LINEX loss functions are computed; at 
a specified number of surviving units with population parameter values 6; and hyper parameters of the 
prior distribution. The computations are performed using R programming language. 

d) Tables 2 and 3 present the Bayes averages under SE and LINEX loss functions of the parameters and 
their ERs and credible intervals based on lower record values for different population parameter values 
for 6, = (0.8,0.2), 0, = (0.6,0.3), 8, = (1.2,0.4) and 0, = (1.5, 0.7) based on records of size Rv = 3, 5, 7 and 
number of replication (NR) = 10000. 

e) Table 4 displays the Bayes averages, ERs and 95% credible intervals of the rf and hrf at ¢) = 0.5,1,2 from 
EG-IKum distribution based on lower record values for different samples of records of size Rv = 3,7 and 
NR = 10000. 


4.1.2 Bayesian prediction 
The predictors of the future lower record values can be obtained from the above results of dgos when 
m= 1, = lan, Wand k= 
a. Determine the value of s, 1 <s <n,, which is the index of the future unobserved lower record value from 
the second sample. 
b. The BP for the future lower records is calculated under SE and LINEX loss functions. 


c. Table 5 displays the point predictors and 95% credible intervals for the future lower record values of Y/,) 
from EG-IKum distribution, where Rv = 6, 0, = 0.8, 0, = 0.5, 0;= 1.2 and 0,= 0.7. 
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Table 2: Bayes averages, estimated risks and credible intervals from EG-IKum distribution based on lower records (0; = 0.8, 4 = 0.6, 


0;= 1.2, 0,= 1.5 and NR = 10000) 


Rv_ | Loss functions Parameters Averages ER LL UL Length 
A, 0.8013 0.0021 0.7993 0.8994 0.0032 

3 SE 0, 0.6020 0.0020 0.5999 0.6034 0.0036 
0; 1.1983 0.0016 1.1968 1.2001 0.0033 

A, 1.4987 0.0013 1.4957 1.5004 0.0047 

0, 0.7979 0.0030 0.7960 0.7999 0.0039 

LINEX 0, 0.5983 0.0019 0.5958 0.6007 0.0048 

0; 1.2007 0.0006 1.1997 1.2013 0.0015 

0, 1.4994 0.0007 1.4971 1.5006 0.0035 

A, 0.7991 0.0014 0.7973 0.7994 0.0028 

5 SE 5 0.6005 0.0009 0.5986 0.6018 0.0032 
03 1.1991 0.0009 1.1975 1.2000 0.0025 

A, 1.4992 0.0009 1.4969 1.5005 0.0037 

A, 0.7978 0.0029 0.7963 0.7993 0.0030 

LINEX 05 0.5994 0.0008 0.5974 0.6003 0.0029 

0; 1.1994 0.0006 1.1978 1.2000 0.0022 

04 1.4993 0.0005 1.4980 1.4959 0.0021 

0, 0.8000 0.0004 0.7991 0.8002 0.0013 

7 SE 0, 0.5994 0.0008 0.5975 0.6005 0.0019 
0; 1.1992 0.0007 1.1980 1.1998 0.0023 

0, 1.4994 0.0006 1.4977 1.5005 0.0018 

0, 0.8003 0.0005 0.7994 0.8007 0.0013 

LINEX 0, 0.5990 0.0008 0.5981 0.5998 0.0017 

0; 1.1998 0.0003 1.1988 1.2006 0.0018 

A, 1.4994 0.0005 1.4981 1.5003 0.0018 

Table 3: Bayes averages, estimated risks and credible intervals from EG-IKum distribution based on lower records (0 = 0.2, 0, = 0.3, 
A; = 0.4, @,= 0.7 and NR = 10000) 

Rv Loss functions Parameters Averages ER LL UL Length 
A, 0.1990 0.0060 0.1973 0.2002 0.0029 

3 SE 0, 0.2985 0.0046 0.2969 0.3008 0.0040 
A; 0.3989 0.0035 0.3972 0.4003 0.0031 

04 0.6980 0.0031 0.6964 0.6997 0.0033 

A, 0.2027 0.0160 0.1995 0.2046 0.0051 

LINEX 0, 0.3005 0.0024 0.2987 0.3015 0.0028 

0; 0.4018 0.0050 0.3999 0.4027 0.0028 

O4 0.6996 0.0013 0.6977 0.7004 0.0027 

A, 0.2007 0.0050 0.1993 0.2002 0.0023 

5 SE 0, 0.2991 0.0028 0.2977 0.3000 0.0022 
0; 0.3990 0.0027 0.3979 0.4002 0.0023 

04 0.7017 0.0028 0.6996 0.7030 0.0033 

0, 0.1993 0.0056 0.1975 0.2007 0.0031 

LINEX 0, 0.3007 0.0024 0.2991 0.3015 0.0024 

0; 0.3998 0.0017 0.3984 0.4009 0.0025 

O4 0.6988 0.0019 0.6975 0.6997 0.0022 

A 0.1995 0.0031 0.1986 0.1997 0.0017 

7 SE 0, 0.2991 0.0022 0.2983 0.2998 0.0014 
A; 0.3997 0.0013 0.3987 0.4005 0.0018 

O4 0.7001 0.0007 0.6990 0.7011 0.0021 

A, 0.1999 0.0023 0.1987 0.2007 0.0020 

LINEX 0, 0.2992 0.0022 0.2979 0.3000 0.0020 

0; 0.4006 0.0024 0.3995 0.4015 0.0021 

O4 0.7011 0.0019 0.6990 0.7022 0.0022 
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Table 4: Bayes averages, estimated risks and credible intervals for the rf and hrf at f) = 0.5,1,2, from EG-IKum distribution based on 


SE and LINEX loss functions for different sample size of records Rv and NR = 10000 


Rv Loss functions rf and hrf Averages ER LL UL Length 
SE Regix (to) 0.7802 0.0014 0.7782 0.7812 0.0030 

3 0.5 heaix (to) 0.4581 0.0027 0.4557 0.4596 0.0039 
LINEX Regix (to) 0.7801 0.0012 0.7785 0.7810 0.0025 

hex (to) 0.4594 0.0013 0.4586 0.4601 0.0015 

SE Regi (to) 0.6368 0.0006 0.6356 0.6373 0.0017 

1 heaix (to) 0.3541 0.0082 0.3507 0.3562 0.0054 

LINEX Rear (to) 0.6378 0.0020 0.6361 0.6388 0.0026 

heaix (to) 0.3580 0.0046 0.3560 0.3588 0.0027 

SE Regix (to) 0.4751 0.0015 0.4738 0.4764 0.0025 

2 hecig (to) 0.2398 0.0092 0.2382 0.2418 0.0035 

LINEX Regi (to) 0.4770 0.0038 0.4751 0.4780 0.0029 

Negix (to) 0.2401 0.0071 0.2391 0.2410 0.0019 

SE Regix (to) 0.7797 0.0007 0.7788 0.7804 0.0016 

0.5 hegix (to) 0.4584 0.0019 0.4569 0.4597 0.0027 

7 LINEX Regi (to) 0.7799 0.0009 0.7790 0.7805 0.0014 
hegix (to) 0.4589 0.0007 0.4581 0.4596 0.0015 

SE Regix (to) 0.6365 0.0005 0.6359 0.6371 0.0012 

1 hegix (to) 0.3577 0.0041 0.3556 0.3587 0.0031 

LINEX Regix (to) 0.6360 0.0012 0.6351 0.6368 0.0017 

hear (to) 0.3574 0.0033 0.3562 0.3587 0.0025 

SE Reo (to) 0.4886 0.0010 0.4874 0.4894 0.0020 

2 hegix (to) 0.2053 0.0023 0.2045 0.2057 0.0019 

LINEX Reo (to) 0.4891 0.0037 0.4878 0.4899 0.0037 

leo (to) 0.2056 0.0014 0.2048 0.2064 0.0014 


Table 5: Point predictors and 95% credible intervals for the future lower record values y{,) from EG-IKum distribution 


(Rv = 6,0, = 0.8, 0) = 0.5, 03 = 1.2 and 0,= 0.7) 

s Loss functions Vs) LL UL Length 
SE 0.8995 0.8977 0.9006 0.0028 

1 
LINEX 0.9004 0.8997 0.9009 0.0012 
SE 0.9011 0.8990 0.9018 0.0029 

3 
LINEX 0.9009 0.8998 0.9015 0.0017 
2 SE 0.9131 0.9097 0.9156 0.0059 
LINEX 0.9091 0.9074 0.9100 0.0025 


4.2 Applications 


In this subsection, three applications to real data sets are provided to illustrate the importance, applicability, 
and flexibility of the EG-IKum distribution based on lower records. The three applications demonstrate 
the superiority of the EG-IKum distribution over some known distributions namely [Kum and G-IKum 
distributions. 
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Table 6: ML estimates of the parameters and standard errors for the three applications based on lower records 


Application Rv Parameters Estimates Standard Errors 
0, 0.8890 0.0007 
I 3 0, 0.6578 0.0293 
0; 0.8255 0.0003 
O4 1.6405 0.0018 
0, 0.9164 0.0001 
Ir 4 0, 1.8020 0.0304 
0; 0.8763 0.0002 
04 6.1045 0.0036 
0, 0.4745 0.0026 
al 7 0, 0.9516 0.0507 
0; 0.5345 0.0030 
04 0.7715 0.0008 


Table 7: ML estimates of rf, hrf and standard errors from EG-I[Kum distribution for the three applications based on lower records 


Application Ry rf and hrf Estimates Standard Errors 
I Regix (to) 0.8875 0.0001 
3 hegix (to) 0.2618 0.0014 
I Reon (to) 0.7961 0.0033 
4 Arcix (to) 0.2042 0.0024 
m Rear (to) 0.8396 0.0011 
7 Ngaix (to) 0.2261 0.0006 


Table 8: Bayes estimates for the parameters and standard errors from EG-IKum distribution for the three applications based on lower 


records 
Application Rv Loss functions Parameters Estimates Standard Errors 
0, 0.9009 8.88e-05 
0, 0.9001 1.01e-04 
SE 0; 0.8990 1.04e-04 
04 1.7008 1.12e-04 
: : 0, 0.9007 9.11e-05 
0, 0.8978 1.84e-04 
LINEX 03 0.9014 1.53e-04 
04 1.7012 2.27e-04 
a 0.9009 3.94e-05 
0, 1.5994 6.25e-05 
SE 0; 0.8999 5.94e-05 
a, 6.0491 4.04e-05 
I 4 
0, 0.8996 4.54e-05 
0, 1.5994 4.33e-05 
LINEX 0; 0.9009 6.57e-05 
04 6.0501 4.97e-05 
a 0.5996 5.26e-05 
0, 0.8992 8.39e-05 
SE 0; 0.4000 6.54e-05 
a, 0.6982 2.17e-04 
Ill if 
0, 0.6031 2.88e-04 
a, 0.9006 8.60e-05 
oa 0; 0.4015 1.83e-04 
04 0.7004 1.06¢e-04 
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Table 9: Bayes estimates for rf, hrf and standard error from EG-IKum distribution for the three applications based on lower records 


Application Loss functions rf and hrf Estimates Standard Errors 


: SE Recix (to) 0.9501 8.68e-05 
hecix (to) 0.3421 1.49e-04 


Rear (to) 0.9486 8.06¢e-05 
hecux (to) 0.3426 8460-05 


Recix (to) 0.7871 7.10e-05 
hrgix to) 0.0304 2.99e-05 


Recix (to) 0.7867 1.09e-04 
hear (to) 0.0320 3.62e-05 


Regix (to) 0.0002 
hear (to) 0.2558 0.0002 
LINEX Rec (to) 0.7981 0.0001 
hegix to) 0.2587 0.0002 


Table 10: Point predictors and 95% credible intervals for the future lower record values y%,) from the three applications 


SE LINEX 
Application Ss 7 Credible interval Z Credible interval 
*©) LL UL Length a0 LL UL Length 
1 1.2007 1.1989 1.2015 0.0026 1.1996 1.1986 1.2013 0.0026 
I 3 1.2012 1.1994 1.2027 0.0033 1.1999 1.1974 1.2014 0.0040 
1 1.2020 1.1999 1.2038 0.0039 1.1986 1.1973 1.1995 0.0022 
I 3 1.2019 1.1993 1.2033 0.0040 1.2017 1.1999 1.2026 0.0027 
1 0.9008 0.8987 0.9018 0.0030 0.8994 0.8982 0.9002 0.0020 
TI 3 0.90213 0.8999 0.9048 0.0048 0.9013 0.8995 0.9032 0.0036 


Tables 6 and 7 display the ML estimates of the parameters, rf, hrf and standard error for the three real 
data sets based on lower records. Tables 8 and 9 present the Bayes estimates of the parameters, rf, hrf and 
standard errors for the real data sets based on lower records. Point predictors and 95% credible intervals for 
the future lower record values Y;,) from the three real data sets are shown in Table 10. 

To check the validity of the fitted model, Kolmogorov-Smirnov goodness of fit test is performed for 
each data set and the p values in each case indicate that the model fits the data very well. Figure 4: displays 
the fitted pdf, PP-Plot and Q-Q plot of the EG-IKum distribution for the first real data. Figure 5: presents the 
fitted pdf, PP-Plot and Q-Q plot of the EG-IKum distribution for the second real data. Also, Figure 6: gives 
the fitted pdf, PP-Plot and Q-Q plot of the EG-I[Kum distribution for the third real data, which indicates that 
the EG-IKum distribution provides better fits to these data sets. 


I. The first application is given by Hinkley (1977). It consists of thirty successive values of March 
precipitation (in inches) in Minneapolis/St Paul. The data is 0.77, 1.74, 0.81, 1.20, 1.95, 1.20, 0.47, 1.43, 
3.37, 2.20, 3.00, 3.09, 1.51, 2.10, 0.52, 1.62, 1.31, 0.32, 0.59, 0.81, 2.81, 1.87, 1.18, 1.35, 4.75, 2.48, 
0.96, 1.89, 0.90, 2.05. 


From the data, one can observe that the following are the lower record values: 0.77, 0.47, 0.32, with 
p-value = 0.1866. 
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Fitted pdf of the first application Q-Q Plot of the first application 


PP- Plot of the first application The empirical scaled TTT-transform plot 
Boxplot 


Figure 4: The fitted pdf, Q-Q Plot, PP- Plot, the empirical scaled TTT Plot and boxplot of the EG-[Kum distribution for the first 
application 


II. The second application is a real data set obtained from Lee and Wang (2003). It represents the remission 
times (in months) of a random sample of 128 bladder cancer patients. The data is 


0.08, 2.09, 3.48, 4.87, 6.94, 8.66, 13.11, 23.63, 0.2, 2.23, 0.26, 0.31, 0.73, 0.52, 4.98, 6.97, 9.02, 13.29, 0.4, 
2.26, 3.57, 5.06, 7.09, 11.98, 4.51, 2.07, 0.22, 13.8, 25.74, 0.5, 2.46, 3.64, 5.09, 7.26, 9.47, 14.24, 19.13, 6.54, 
3.36, 0.82, 0.51, 2.54, 3.7, 5.17, 7.28, 9.74, 14.76, 26.31, 0.81, 1.76, 8.53, 6.93, 0.62, 3.82, 5.32, 7.32, 10.06, 
14.77, 32.15, 2.64, 3.88, 5.32, 3.25, 12.03, 8.65, 0.39, 10.34, 14.83, 34.26, 0.9, 2.69, 4.18, 5.34, 7.59, 10.66, 
4.5, 20.28, 12.63, 0.96, 36.66, 1.05, 2.69, 4.23, 5.41, 7.62, 10.75, 16.62, 43.01, 6.25, 2.02, 22.69, 0.19, 2.75, 
4.26, 5.41, 7.63, 17.12, 46.12, 1.26, 2.83, 4.33, 8.37, 3.36, 5.49, 0.66, 11.25, 17.14, 79.05, 1.35, 2.87, 5.62, 
7.87, 11.64, 17.36, 12.02, 6.76, 0.4, 3.02, 4.34, 5.71, 7.93, 11.79, 18.1, 1.46, 4.4, 5.85, 2.02, 12.07. 


From the data, the following are the lower record values : 8, 2.09, 0.2, 0.19, with p-value = 0.4922. 
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Fitted pdf of the second application 


PP- Plot of the second application 


Boxplot 


Q-Q Plot of the second application 


T(in) 


The empirical scaled TTT-transform plot 


Figure 5: The fitted pdf, Q-Q Plot, PP-Plot, the empirical scaled TTT-transform plot and boxplot of the EG-IKum distribution for the 


second application 


Ill. The third application is the vinyl chloride data obtained from clean upgrading, monitoring wells in mg/L; 
this data set was used for Bhaumik et al. (2009). The data is: 5.1, 1.2, 1.3, 0.6, 0.5, 2.4, 0.5, 1.1, 8.0, 0.8, 
0.4, 0.6, 0.9, 0.4, 2.0, 0.5, 5.3, 3.2, 2.7, 2.9, 2.5, 2.3, 1.0, 0.2, 0.1, 0.1, 1.8, 0.9, 2.0, 4.0, 6.8, 1.2, 0.4, 0.2. 


From the original data, one can observe the following lower record values: 5.1, 1.2, 0.6, 0.5, 0.4, 0.2, 


0.1, where the p-value = 0.185. 
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Figure 6: The fitted pdf, Q-Q Plot, PP-Plot, the empirical scaled TTT-transform plot and boxplot of the EG-IKum distribution for the 
third application 


e The proposed distribution is compared to demonstrate the superiority of EG-[Kum distribution over some 
known distributions namely [Kum and G-IKum distributions. To verify which distribution fits better to 
the real data set, the Kolmogorov-Smirnov test is employed. Other criteria including (maximized log- 
likelihood), Akaike information criterion (AIC), Akaike information criterion corrected (AICC) and 
Bayesian information criterion (BIC) are used to compare the fit of the competitor distributions, where, 

2k(k+1 
AIC = 2k — 2 log (L), AICC = AIC + _ and BIC =hklog(n) — 2 log(L), 
where & is the number of the parameters in the statistical model, 1 is the sample size and L is the maximized 
value of the LF for the estimated model. The best distribution corresponds to the highest p-value and the 
lowest values of —2log (L), AIC, AICC and BIC. 


e Tables 11, 12 and 13 show the p-values of the Kolmogorov-Smirnov test, —2log (L), AIC, AICC and BIC 
for the three real data sets. 
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Table 11: The goodness of fit measures for fitted models of Application 1 


Model P-value -2LL AIC AICC BIC 
EG-IKum 0.184 7.004 0.995 2.595 6.6001 

IKum 0.071 8.646 16.646 18.246 22.25 
G-IKum 0.134 11.256 17.256 18.179 21.459 


Table 12: The goodness of fit measures for fitted models of Application 2 


Model P-value -2LL AIC AICC BIC 
EG-IKum 0.4922 7.371 15.371 16.971 20.975 

IKum 0.1127 14.195 22.195 23.795 27.800 
G-IKum 0.0639 21.115 27.115 28.038 31.318 


Table 13: The goodness of fit measures for fitted models of Application 3 


Model P-value -2LL AIC AICC BIC 
EG-IKum 0.185 6.5805 14.581 16.181 20.185 

TKum 0.083 10.899 18.899 20.499 24.504 
G-IkKum 0.098 21.366 27.366 28.289 31.569 


4.3 Concluding remarks 


From Tables 2 and 3 one can notice that the ERs of the Bayes estimates of the shape parameters decrease 
when the sample size increases. Also, the lengths of the credible intervals become narrower as the sample 
size of records increases. 


It is clear from Table 4 that the ERs of rf and hrf perform better when the sample size increases, and the 
lengths of the credible intervals get shorter when the sample size increases. 


One can observe that the ERs for the estimates of the parameters, rf and hrf under the LINEX loss 
function have lesser values than the corresponding ERs of the estimates under the SE loss function. 


From Table 5, one can observe that the lengths of the BPB increase when s increases. 


Also, the lengths of the BPB under the LINEX loss function perform better than the corresponding 
lengths under the SE loss function. 


Regarding the standard errors for the Bayes estimates of the parameters, rf and hrf in Tables 8 and 9, the 
LINEX loss function seems to perform better than the SE loss function. 


From Table 10, one can notice that the predictive intervals include the predictive values (between the LL 
and UL). Also, the BPs of the future observations are very close to the actual observations. 


From Tables 11, 12 and 13 one can observe that the EG-IKum distribution has the lowest K-S values 
and the highest p-values for the three applications. Thus, it provides the best fit for the data compared 
to the other competitors of distributions. Moreover, the EG-[Kum distribution has the smallest values of 
the —2log (L), AIC, AICC and BIC which implies that the proposed model is the best among the other 
competitors of distributions (IKum and G-IKum). 
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Some suggestions for future research 


1. Considering other methods of estimation for the parameters, rf and hrf such as modified maximum 
likelihood or modified moments. 


2. One sample Bayesian prediction from EG-IKum distribution based on dgos can be obtained. 


ies) 


. Considering different types of loss functions such as balanced loss functions i.e., balanced absolute, 


balanced binary, balanced modified LINEX and balanced general entropy loss functions. 


NYDN fF 


can be obtained. 


. Transmuted EG-IKum distribution may be studied. 

. Recurrence relations for single and product moments of EG-[Kum distribution may be derived. 

. E-Bayesian estimation for EG-[Kum distribution may be studied. 

. ML and Bayesian estimation for EG-I[Kum distribution based on Type I and Type II censored samples 


8. Empirical Bayesian for EG-[Kum distribution may be studied. 


Acknowledgements 


The authors would like to thank the referees and the editor for their comments which led to the improvement 
of the earlier version of this article. 


Abbreviations 

AIC Akaike information criterion 
AICC Akaike information criterion corrected 
BIC Bayesian information criterion 
BPD Bayesian predictive density 

BP Bayes predictors 

BPB Bayes predictive bounds 

cdf Cumulative distribution function 
CIs Confidence intervals 

dgos Dual generalized order statistics 
EG Exponentiated generalized 
EGGC EG general class 

EG-IKum EG inverted Kumaraswamy 

hrf Hazard rate function 

LINEX Linear exponential 

LL Lower limit 

pdf Probability density function 
ML Maximum likelihood 

MLP ML predictors 

MLPB ML predictive bounds 

rhrf Reversed hazard rate function 
rf Reliability function 

SE Squared error 

UL Upper limit 
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Chapter 11 


A New Class of Discrete Distribution 
Arising as an Analogue of Gamma-Lomax 
Distribution: 

Properties and Applications 

Indranil Ghosh,'* Ayman Alzaatrel? and GG Hamedani’ 


1. Introduction 


Since the beginning of the first world war (and may be even prior to that) there have been many industrial 
disputes ranging from low labor wages to uncomfortable work conditions that led to several strikes in the UK. 
According to the leading daily newspaper in Britain, the Guardian, “the number of workers who went on strike 
in Britain last year fell to the lowest level since the 1890’s”. Furthermore, data from the Office for National 
Statistics in the UK show 33,000 workers were involved in labor disputes in 2017, down from 154,000 a 
year earlier. This is the lowest number since records began in 1893, the year of Britain’s first national coal 
strike, when the figure was 634,000. The major seminal event that is worth mentioning here is the miners’ 
strike of 1984-1985. Indeed, this may be considered as an example of a large scale industrial reaction to stop 
the operation of the British coal industry. For further details on this event, an interested reader is suggested 
to look in the Wikipedia. It is to be noted that many other industries in the UK have received major set-back 
in terms of productivity and loss in revenue because of strikes due to various reasons. Needless to say, it 
has been a matter of great concern to the industries as to how one can analyze the quantum of these strikes 
and take appropriate measures. There exists a sizable number of research articles in the literature where 
quantitative insights into strikes are discussed. Among them some noteworthy models are those described in 
Velden (2000), Skeels and McGrath (1991), Leigh (1984), Buck (1984), Mauleon and Vannetelbosch (1998) 
and the references cited therein. While a majority of these earlier works focus on establishing a regression 
type modeling in search for a causal relation, a few of the proposed models search for an appropriate discrete 
probability model that might discuss the behaviors and patterns arising due to such strikes. This serves as a 
major motivation to carry out this research work. 

In recent years there has been a growing interest in exploring several discrete distributions in univariate, 
bivariate, and in the multivariate domain albeit computational complexity and the absence of tractable 
probability mass functions. For a non-exhaustive list of references on such developments of discrete 
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probability models in univariate and in higher domains, the readers are encouraged to see the book by 
Johnson et al. (2005) on discrete multivariate distributions; characterizations of recently developed twenty 
discrete distributions by Hamedani et al. (2021); a new versatile discrete distribution by Turner (2021); the 
new discrete distribution with application to COVID-19 data by Almetwally et al. (2022); a one-parameter 
discrete distribution for over-dispersed data by Eliwa and El-Morshedy (2021); a new discrete analog of 
the continuous Lindley distribution by Al-Batain et al. (2020); a discrete Pareto (type-IV) distribution by 
Ghosh (2020), and the references cited therein. While several of them are developed as a continuous analog 
of certain known distributions, such as Pareto (type-IV) and Lindley, several of them have been developed 
using some other techniques of generating a new class of discrete distributions. However, none of the above- 
mentioned probability models have been applied to model strike data which we aim to discuss in this paper. 
The main objective of this article is to establish that the industrial strikes data, specifically strikes data sets 
arising out of several industries in the UK, can be described by a discrete probability model, namely the 
discrete gamma-Lomax model. We begin our discussion by providing a general framework which leads to 
our specific discrete gamma-Lomax distribution from its continuous analogue model. 

Suppose that is the cumulative distribution function of any random variable X, and is the probability 
density function of a random variable R defined on [0,0). The probability density function of the gamma-X 
family of distributions defined by Alzaatreh, et al. (2013, 2014) is given by, 


1 
1 ae a 
fy (x) = ~ Fa (x)\(-logt - F, (2) - Fp())? x eR. (1) 
I(a)p 
If the random variable R follows the Lomax distribution with the density function fp(x) = kG! (1 + x/O""), 
k>0; x > 0, then (1) reduces to the gamma-Lomax distribution (GLD) as, 


1 x oe x] 
Oras} [els] . 


x > 0, where c = B/k, a and @ are positive parameters. 

Note that if X + 0 is replaced by_X, then (2) reduces to the gamma-Pareto distribution which was proposed 
and studied in Alzaatreh et al. (2012a). From (2), the cumulative distribution function of the gamma-Lomax 
distribution is given by, 


Ful) = peor ase og] 1+ [x20 (3) 


where y(a.,t) = i u*! e" du is the incomplete gamma function. 

There is a variety of works available in the literature that extends the Lomax distribution under 
the continuous paradigm. For example, (1) was studied by Cordeiro et al. (2015) as a particular case of 
Zografos- Balakrishnan (G) family of distributions, where G is any baseline continuous distribution (from 
the perspective of a gamma generated model). Lemonte and Cordeiro (2013) proposed the McDonald 
Lomax distribution, which is an extension of the classical Lomax distribution. Ghitany et al. (2007) studied 
properties of a continuous probability model derived from the Lomax model and utilized the Marshall and 
Olkin type generator. However, not much work has been done towards discrete Lomax mixture type models. 
In the work of Prieto et al. (2014), the authors considered the discrete generalized Pareto model (mixing 
with zero-inflated Poisson distribution) in modeling road accident black spots data. Ghosh (2020) discussed 
and studied a discrete version of Pareto (Type IV) model and established that the associated p.m.f. can be 
approximately symmetric, left- or right-skewed with applications in modeling several types of count data. 
Dzidzornu et al. (2021) studied the performance of the discrete generalized Pareto models in modeling non- 
insurance claims. Amponash et al. (2021) discussed various estimation strategies regarding several variants 
of discrete Pareto models. These are some references from which one could find a strong motivation to carry 
out this work which is stated as follows: (a) As pointed out in Hitz et al. (2018), there are several situations 
in which modeling extreme value events [in the univariate case, an approach that often works quite well 
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in practice is to model observations above a large threshold with a parametric family of distributions] for 
count data, the limiting distribution behaves like a generalized Pareto distribution. However, not much work 
has been done to address the issue with discrete data, and, in particular, if the data can be modeled with a 
discrete Lomax distribution, how such extremes for discrete values will behave asymptotically; (b) In income 
modeling if the data (cross-sectional over time) is both discrete and continuous, for the discrete portion, the 
DGLD developed in this article can play as a natural candidate distribution. Therefore, the development of 
this discrete analog of a continuous Lomax distribution is of paramount importance. It is observed that the 
discrete model is unimodal and the p.m.f. (probability mass function) is always a decreasing function. So, the 
model is somehow restricted in nature. Now, the proposed discrete gamma-Lomax distribution (henceforth, 
for short, DGLD) is not always decreasing. Consequently, it has greater flexibility. 

The discrete gamma-Lomax distribution can be defined as follows: 


g(x) =P(x<sX <x+l) 
= S(x)-S(x+1) @) 


= pol {ene’ log) 2d oe) 


x € N*; where N* = N U{0}, and N is the set of all positive integers. 
From (4), the cumulative distribution and the survival functions of the DGLD are, respectively, given by, 


Gta) ee ogo alt x>0, (5) 


T(@) 


Sc) =r" og LL mal x>0, (6) 


I(@) 


where |x| = max{m € Z|m < x} is the floor function. The probability mass function in (4) will be utilized for 
the maximum likelihood estimation of the parameters, see Section 4. The survival function in (6) is useful 
for censored maximum likelihood estimation of the parameters, see Subsection 4.2. Figure 1 shows various 
plots of the DGLD, where the scale parameter 9 = | for various values of the shape parameters c and a. The 
plots indicate that DGLD exhibits various shapes including reversed J and right skewed unimodal shapes. 
The rest of the paper is organized as follows. In Section 2, we discuss some structural properties of 
the proposed DGLD including shapes and transformations. Moments and order statistics are discussed in 
Section 3. Section 4 deals with certain characterizations of the DGLD. In Section 5, maximum likelihood 
estimation under regular and censored data setup are discussed, also the performance of the ML method is 
investigated through a simulation study. Three different real life data sets are considered to illustrate the 
applicability of the DGLD in Section 6. Finally, some concluding remarks are provided in Section 7. 


2. Structural properties 


In this Section, we discuss some useful structural properties of the DGLD distribution. The following Lemma 
is useful for simulating random samples from the DGLD. 


Lemma 1. 
(a) Ifthe random variable Y has the gamma-Lomax distribution with parameters c, a and 0, then the random 
variable X = |Y| follows the DGLD (c, a, 8). 


(b) If the random variable Y has the gamma distribution with parameters a and c, then the random variable 
X = |@e"— 1)| follows the DGLD (c,a,6). 
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Figure 1: Plots of DGLD for various values of c and a. 


Proof. The proof follows immediately from (3) and (5). 
The hazard function associated with the DGLD is given by, 


fre log [ + at - pase" log + all 
Madr} log] 1+] 


Discrete hazard rates originate in various common situations in reliability theory in which clock time is 
not the best scale on which to describe lifetime. For example, in ammunition reliability, the number of rounds 
fired until the first failure (say) is more important than the failure age. A similar scenario can be envisioned 
when a piece of equipment operates in cycles and the observation is the number of cycles successfully 
completed prior to the failure. In other situations, a device is monitored only once per time period and the 
observation then is the number of time periods successfully completed prior to the failure of the device, for 
details, see Shaked et al. (1995) and the references cited therein. 


A(x) = eN*, (7) 


Lemma 2. The discrete gamma-Lomax distribution has a decreasing probability mass function and hence a 
decreasing failure rate for a < 1. 


Proof. By differentiating (2), it is easy to show that the DGLD has a decreasing density for a < 1. Now let 
X, <X, then g(x,) < g(x,). Also, by the definition of the survival function we have S(x,) < S(x,), and therefore 
h(x,) < h(x). Hence the proof. 
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Theorem I. The DGLD is unimodal, and the mode is x = m, where m €{|x9] — 1, [xo], [xo] + 1} and 
Xo = max {0,0 {exp(c(a — 1)/(c + 1))—-1}} Furthermore, if |x9| = 0, then the mode m is 0 or 1. 


Proof. By differentiating (2), we can see that the gamma-Lomax distribution is unimodal with mode at x)= 0 
for a < 1 and at x)= 9 exp(c(a — 1)/(c + 1)) for a > 1. Consequently, the DGLD is also unimodal with mode 
at m as given in Theorem 2 of Alzaatreh et al. (2012b). This completes the proof. 


Remark. The discrete gamma-Lomax p.m.f. can be written as linear combinations of discrete Pareto p.m.f’s. 


Proof of the Remark. We consider the following series expansion, 


xe 


a, x _ es: ine 8 
rt = eG 0 ) casa ( ) 
from Nadarajah and Pal (2008). Also, for any c € R, we have 


a ee ae (3 J" oe ee y" . 6 


k- J k WV"! +, 
where ¢..)=¢| °° || "| Ppp, <1 and, , >. fe OE 195s 
- Ne D ke m=l m+1 fe 


The p.m.f. in (4) can be rewritten as, 


\" conte) x 4] k+m+a i k+m+a@ . 
Wily |p a er 10 
g(x)= Diva j=0 oe 4: a) [toms ( 0 () x eS N*, ( ) 


Equation (10) shows that the DGLD can be written as linear combinations of discrete Pareto distributions 
with parameters k + m+ a and 8. For more information about the discrete Pareto distribution and some of 
its properties, the reader is referred to Buddana and Kozubowski (2014), Krishna and Pundir (2009) and 
Alzaatreh et al. (2012b). 


3. Moments 
The r* moment of the DGLD is given by, 


E(X") = a. ° a (reve og) thre gl +z (11) 


Theorem 2. If c < 1/r, then the r*moment of the DGLD (c, a, 8) exists. 


Proof. Assume that X follows the GLD. Alzaatreh et al. (2012a) showed that if c < 1/r, then E(X”) exists for 
all r. Then, the proof immediately follows due to the fact that 0 < |X] <x. 


Table | provides the mean, variance, skewness, kurtosis and the mode of the DGLD with scale parameter 
8 = 1, and for various values of the shape parameters o and c. For a fixed a, the mean, the variance and the 
mode of the DGLD are nondecreasing functions of c. Also, for a fixed c, the mean, the variance and the 
mode of the DGLD are nondecreasing functions of a. The skewness of the DGLD is always positive, and the 
kurtosis value shows that the DGLD possesses a heavy tail characteristic. 
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Table 1: Mean, variance, skewness, kurtosis and mode of the DGLD for various values of o and c. 


a c mean variance skewness kurtosis mode 
0.5 0.1 0.0002 0.0002 74.3055 5926.7829 0 
0.15 0.0025 0.0029 24.9902 873.6643 0 
0.2 0.0097 0.0131 17.3414 672.0767 0 
1 0.1 0.0010 0.0010 33.6432 1236.0707 0 
0.15 0.0106 0.0125 12.7101 241.0927 0 
0.2 0.0369 0.0525 9.8004 245.2655 0 
5 0.1 0.1971 0.2001 2.4807 11.3941 0 
0.15 0.7369 0.9638 2.7579 24.7541 0 
0.2 1.5422 3.6585 5.2599 160.4486 1 
10 0.1 1.3662 1.1875 1.8591 11.7731 1 
0.15 3.5800 9.6806 3.8991 53.5608 2 
0.2 7.8133 78.7267 9.3104 906.2119 3 
15 0.1 3.3571 4.9135 2.3565 16.1693 2 
0.15 9.9476 79.6694 5.0778 106.7949 5 
0.2 26.9217 1319.0287 16.1348 3555.6579 9 


3.1 Order statistics 


Let Xj, X>,°** , X, be a random sample drawn from the p.m.f. in (4). Then, the p.m.f. of the i" order statistic, 
X;.,, 18 given by, 


P(X, =x)= = ie = | ta u(1—u)"' du 
“Ge Te y pe oe 
— ; lara ne 
where, 
aes ee 
A(x) = a > y a tog( 1-224) 


and s,, oe ‘k, and p, ; -TT" “E \(k, +a). Now, one can use (11) to obtain a general r* order moment 
of Xn = 

The distribution of maximum and minimum order statistics may be obtained as follows. Let 
X, i = 1,2,---, n be independent DGLD random variables with parameters c;, a, and 8. Define 
U = min(X,, X3,°°° , X,,) U = max(X,, X5,°"" , X,) and W = max(X,, X,,°": , X,). Then the cumulative 
distribution function of U, on using (8), can be written as, 


: 1 Gere” u+1))** 
PU <u)=1 I] 1 ae Fe ftoe( + 7 } 
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Hence, the p.m.f. of U can be obtained using P(U = u) = P(U < u)-P(U < u-1). Next, by a similar 
approach, the cumulative distribution function of W is given by, 


Ty 1 YoNte” w+l)]” 
ia M45 2 fsa *e( 8 ] 


For the distribution of the range (R), one can use equations (10) and (11) from Kabe et al. (1969) to find 
the joint distribution of the maximum and the minimum order statistics P(X)... = Xy-:5 Xen = Xpen) AS Well as 
PX» = A ye 


4. Characterizations of the DGLD 


To understand the behavior of the data obtained through a given process, we need to be able to describe this 
behavior via its approximate probability law. This, however, requires establishing conditions which govern 
the required probability law. In other words we need to have certain conditions under which we may be able to 
recover the probability law of the data. So, characterization of a distribution is important in applied sciences, 
where an investigator is vitally interested to find out if the proposed model follows the selected distribution. 
Therefore, the investigator relies on conditions under which the model follows a specified distribution. 
A probability distribution can be characterized in different directions, one being based on the truncated 
moments. This type of characterization initiated by Galambos and Kotz (1978) and followed by other authors 
such as Kotz and Shanbhag (1980), Gla‘nzel et al. (1984), Gla’nzel (1987), Gla“nzel and Hamedani (2001) 
and Kim and Jeon (2013), to name a few. For example, Kim and Jeon (2013) proposed a credibility theory 
based on the truncation of the loss data to estimate conditional mean loss for a given risk function. It should 
also be mentioned that characterization results are mathematically challenging and elegant. In this section, 
we present two characterizations of the DGLD based on: (i) conditional expectation (truncated moment) of a 
certain function of a random variable and (11) the reverse hazard function. Next, we provide characterizations 
of the DGLD in terms of the reverse hazard function and conditional expectations of a certain function of a 
random variable. The following lemma is useful for this purpose. 


Lemma 3. Let Z be a discrete random variable taking values of natural numbers 0,1,2,..., with probability 
mass and distribution functions g and G, respectively. Then the following hold: 


(1) E(G(Z) + G(Z-1)1Z< hk = Gb, k= 0,1,2...., 


(2) S&+2)_ gk+l) _ Gk+2)GK)- G(k +1) 
G(k+l) — G(k) G(k)G(k +1) 


,k =0,1,2,... 
Proof. Straightforward computation with g(k) = G(k) — G(k— 1). 
4.1 Characterization of the DGLD in terms of the conditional expectation of a certain function 


of a random variable 


Proposition 1. Let X :Q-— N*=N U {0} be a random variable. The p.m-f. of X is (4) if and only if, 


efrles-2t form eS 
aroha] 


(12) 
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Proof. If X has p.m.f. in (4), then (12) holds by Lemma 3. Conversely, if (12) holds, then, 


- “4 x+l1 4 x 
> [fa jg *4] yf ane e+} eco] a) 
4 x+1 
= Gtby a c log| +=" 


From (13), we also have, 


Yee Mog [eth rane" tee f+ ]} ec} 


= {G(k)- g(x)} rae" log] +4} 


where we used G(k — 1) = G(k) — g(k). 
Now, subtracting (14) from (13), yields, 


faa £0 
confit foes 


From the above equality, we have, 
ra c' log [ + “|| 
g(x) __| 0 


oe) rac “og| + #211) | 


which is the reverse hazard function of the random variable X with the p.m-f. in (4). 


(14) 


rG(k) 


4.2 Characterization of the DGLD in terms of the reverse hazard function 


Proposition 2. Let X :Q— N* =N U {0} bea random variable. The p.m.f. of X is 
(4) if and only if its reverse hazard function, rg(k), satisfies the difference equation, 


y{ae"tog| +E [Pyare “og| 1+ #22 y *fane" tog] + 1) 
, (5) 
rane tog] +1] ane “og|1+£22 f 


ra(k +1) —re(k) = 


with initial condition r,(0) = 1. 
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Proof. If X has a p.m.f. in (4), then (15) holds by Lemma 3. Now, if (15) holds, then for x € N, we have, 


x-1 


Slik +) -r6(0)} 


k=0 


k k+l 
; ,¢ log} 14+— ,¢ log} 1+—— 
3 rfavetoalig freee 5] 
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5.1 Regular maximum likelihood estimation 


or 


¥q(x)— 1 (0) = — 


or, in view of r,(0) = 1, we have, 


r(x) =1 


5. Estimation 


In this subsection, we consider the maximum likelihood estimation method in order to estimate the model 
parameters of the DGLD. The maximum likelihood estimators (MLEs) enjoy desirable properties and can be 
used to construct confidence intervals and regions, and also test statistics. The large sample asymptotics for 
the estimators obtained in this approach, under mild regularity conditions, can be easily handled numerically. 
To apply the method of maximum likelihood for estimating the parameter vector A = (c,a,0)’ of DGLD, we 
assume that X = (x), X>,-+-, X, )7 is arandom sample of size n from a X ~ DGLD (c,a,0). The log-likelihood 
function becomes, 


" x, +1 7 et Bi 
teres oem formal 


The above equation can be maximized using available statistical software such as R (optim function) 
and SAS (PROC NLMIXED), or by solving the nonlinear likelihood equations obtained by differentiating 
the likelihood function. Regarding interval estimation and hypothesis tests, one may use standard likelihood 
techniques based on the observed Fisher information matrix, since in this case it is difficult to obtain expected 


n 
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values of the elements of the Fisher Information matrix. For example, the asymptotic covariance matrix of 
A can be approximated by the inverse of the observed Fisher Information matrix evaluated at A. One may 
consider appropriate Likelihood ratio (LR) test(s) to test for the model parameters of the DGLD. 


5.2 Censored maximum likelihood estimation 


One may also consider the estimation under censoring. Censoring is common in lifetime data sets. There are 
many types of censoring: type I censoring, type I censoring, and others. A general form known as multi- 
censoring can be described as follows: there are m lifetimes of which 


* myhave failed at times T,,°-* ,T x9; 
* m, have failed at times belonging to (S,_),S;] ,1= 1,-°> .m,; 


* mz, have survived the times R;, i= 1,--: ,m, but have no longer been observed. It is obvious that, m = mp 
+m,+m). 
For the multi-censoring data, the associated log likelihood function will be, 


log L(c,a@,0) 

fobented] 
feeb 
Se Mart ea | 


The maximum likelihood estimators of c, a and @ can be obtained by maximizing the above function. 


mo T 
=—(m, +m, )log(T'(a)) Zi {ee og / 
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5.3 Simulation study 


To evaluate the performance of the MLE method, a simulation study is conducted for a total of eighteen 
parameter combinations and the process is repeated 3000 times. Three different sample sizes n = 100, 200 
and 500 are considered. The bias (estimate-actual) and the root mean square errors (RMSE) of the parameter 
estimates for the MLE are presented in Tables 2, 3 and 4 respectively. It is noted from the tables that the bias 
and RMSE for the parameter c are in most cases higher than the bias and RMSE for a and 9. In general, the 
ML method performs well in estimating the DGLD parameters. As expected, reduction in the bias and the 
RMSE values are observed for all parameter combinations with an increase in the sample size. 


6. Application 


In this Section, the discrete gamma-Lomax distribution is applied to several data sets. These data sets are 
taken from Consul (1989). The three data sets represent the observed frequencies of the number of outbreaks 
of strike in three leading industries in the U.K. during 1948-1959. These industries are Coal-mining, Vehicle 
manufacturing and Transportation. The data are depicted in Tables 5, 7 and 9. Consul (1989) fitted the data 
for the three industries to the generalized Poisson distribution (GPD) with p.m.f. given by,’ 


OO+Axy te Or | x! x=0,1,2,... 


P(O,A)= ; 
0 forx > mif A <0, 


where, 0 > 0, max(—1,—0/m) <A’ < 1 and m> 4 is the largest positive integer for which 8 + mA >0 when dA <0. 
The results showed that the GPD does not provide an adequate fit to the coal-mining industry data sets. 
From the goodness of fit summary values, it appears that the discrete gamma-Lomax distribution provides a 


A New Class of Discrete Distribution Arising as an Analogue of Gamma-Lomax Distribution 


191 


Table 2: Bias and RMSE for the parameter estimates using MLE method for n = 100. 


Actual values Bias RMSE 

c a é a 6 é a 6 
0.1 1 0.1423 —0.1238 0.0877 0.8850 0.1614 0.2210 
0.1268 0.0492 0.1529 0.8566 0.1271 0.4352 
5 0.1311 0.0568 0.0098 1.3633 0.2229 0.1150 
0.1573 0.0904 0.0538 1.1027 0.2826 0.2184 
10 —0.1407 0.2693 0.0253 0.9337 0.6942 0.0408 
—0.1528 0.2781 0.0161 0.8668 0.7487 0.0652 
0.5 1 —1.1769 0.1524 0.2709 1.8470 0.2067 0.4075 
—1.3217 0.1660 0.3520 1.8728 0.2259 0.6470 
5 —0.4169 0.2123 0.0616 2.3357 0.4445 0.1552 
—1.0375 0.3312 0.2132 2.3711 0.5265 0.3633 
10 1.5882 1.2261 0.6632 1.9993 1.5346 0.7292 
—1.0357 1.0231 0.0587 1.7280 1.4654 0.3731 
1 1 —1.2633 0.1524 0.2709 1.8949 0.1957 0.4302 
—1.3518 0.1660 0.3802 2.0138 0.2340 0.5467 
5 0.4145 0.2123 0.1533 2.1962 0.4556 0.2321 
1.1132 0.3351 0.5591 2.4516 0.5378 0.9881 
10 —1.8528 1.2261 0.1463 2.2154 1.5356 0.3470 
1.2936 1.0231 0.0578 1.9173 1.5127 0.7002 

Table 3: Bias and RMSE for the parameter estimates using MLE method for n = 200. 
Actual values Bias RMSE 

c a é a 6 é a 6 
0.1 1 0.0357 0.0064 0.0119 0.4013 0.0436 0.0806 
—0.0937 0.0136 0.0217 0.4466 0.0522 0.1959 
5 0.0214 0.0169 0.0052 0.4376 0.0936 0.0431 
0.1022 0.0141 0.0326 0.4025 0.0939 0.0923 
10 —0.0523 0.0738 0.0152 0.4155 0.2356 0.0211 
—0.1205 0.1017 0.0138 0.3390 0.2687 0.0297 
0.5 1 —0.2371 0.0315 0.0579 0.5312 0.0893 0.1696 
—0.1534 0.0265 0.1020 1.0196 0.0788 0.3047 
5 —0.2108 0.0607 0.0214 1.0525 0.1881 0.0849 
—0.0999 0.0448 0.0352 1.2412 0.1231 0.1849 


10 —0.4528 0.0952 0.2354 0.9858 0.1830 0.4751 

—0.6322 0.3218 0.0172 0.9756 0.4576 0.0250 

1 1 0.4482 0.1293 —0.0346 0.9925 0.1877 0.2403 
0.4051 —0.1104 —0.0836 0.6944 0.1652 0.4956 

> 0.3167 —0.0348 —0.0574 0.8260 0.4455 0.1292 

0.8452 —0.0698 —0.1348 1.0160 0.3666 0.4882 

10 0.9452 0.1726 —0.1152 1.1006 0.3991 0.6132 

0.5367 —0.1628 —0.0243 0.5967 0.5884 0.0282 
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Table 4: Bias and RMSE for the parameter estimates using MLE method for n = 500. 


Actual values Bias RMSE 

c a 0 é a 0 é a 6 
0.1 1 1 0.0237 —0.0031 0.0219 0.0238 0.0116 0.0228 
2 0.0241 —0.0085 —0.0135 0.0242 0.0128 0.0136 
5 1 0.0187 —0.0104 —0.0028 0.0188 0.0417 0.0030 
2 0.0153 —0.0102 —0.0157 0.0154 0.0412 0.0157 
10 1 0.0204 —0.0201 —0.0079 0.0205 0.3461 0.0090 
2 0.0197 0.0845 —0.0102 0.0200 0.3736 0.0102 
0.5 1 1 0.0416 —0.0129 —0.0110 0.0424 0.0131 0.0111 
2 0.0340 —0.0083 —0.0068 0.0349 0.0086 0.0084 
5 1 0.0470 0.0260 —0.0151 0.0479 0.0269 0.0157 
2 0.0470 0.0260 —0.0227 0.0477 0.0263 0.0227 
10 1 0.0504 0.0882 —0.0324 0.0514 0.1115 0.0327 
2 0.0577 0.0595 —0.0911 0.0592 0.0913 0.0911 
1 1 1 0.3426 —0.0513 —0.0210 0.3752 0.0520 0.1336 
2 0.5548 —0.0131 —0.0453 0.5708 0.0147 0.2322 
5 1 0.2893 —0.0123 0.0289 0.3457 0.0356 0.0845 
2 0.1749 0.0321 0.0578 0.1993 0.0386 0.0660 
10 1 0.3584 0.1494 0.0279 0.3820 0.1514 0.0962 
2 0.3325 0.1494 0.0095 0.3758 0.1690 0.0381 


Table 5: The number of outbreaks of strike in the coal-mining industry in UK. 


x-value Observed Three-parameter DGLD Two-parameter DGLD GPD 
0 | 46 45.99 45.21 50.01 

1 | 16 75.59 78.30 65.77 

2 | 24 26.23 23.74 32.23 

3 | 9 6.32 6.11 7.23 
>=4 | 1 1.87 2.64 0.76 
Total | 156 156 156 156 


Table 6: The estimated parameters and goodness of fit for the outbreaks of strike in the coal-mining industry in UK data. 


Model Parameters K-S Ww ih ¢ df 77 p-value 
Three-parameter a= 4.5109 0.0116 0.0027 1.7279 1 0.1887 
DGLD € = 0.0468 
0= 6.2260 
Two-parameter a& = 8.8492 0.0105 0.0078 2.4755 2 0.2900 
DGLD € = 0.0.0989 
GPD i =-0.1450 0.0400 0.1194 | 4.5234 2 0.0334 
0= 1.1377 
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Table 7: The number of outbreaks of strike in the vehicle-manufacture industry in UK. 


x-value Observed Three-parameter DGLD Two-parameter DGLD GPD 
0 110 109.90 119.64 109.82 
1 33 33.43 34.73 33.36 
2 9 8.98 TT 9.24 
3 3 2.54 3.86 | 3.58) 
>=4 1 1.15 J : 
Total 156 156 156 156 


Table 8: The estimated parameters and goodness of fit for the outbreaks of strike in the vehicle-manufacture industry in UK data. 


Model Parameters K-S WwW 1G 7 df 77 p-value 
Three-parameter DGLD & = 1.2332 0.0341 0.0774 0.1082 1 0.7422 
é€ = 0.0546 
0 = 11.5823 
Two-parameter DGLD é@ = 2.9637 0.0088 0.0021 0.2882 1 0.5914 
é€=0.1931 
GPD d= 0.144 0.0727 0.1225 0.0600 1 0.8065 
0=0.351 
Table 9: The number of outbreaks of strike in the transport industry in UK. 
x-value Observed Three-parameter DGLD Two-parameter DGLD GPD* 
0 114 114.20 114.15 114.41 
1 35 34.13 33.97 26.01 
2 4 5.58 5.85 4.83 
3 2 1.35 1.36 0.85 
>=4 1 0.74 0.67 9.88 
Total 156 156 156 156 


Table 10: The estimated parameters and goodness of fit for the outbreaks of strike in the transport industry in UK data. 


Model Parameters k-S WwW yes 7 df p-value 
Three-parameter DGLD a= 7.6182 0.0058 0.0006 0.8707 1 0.3508 
€=0.1569 
0 = 0.3166 
Two-parameter DGLD d= 3.6368 0.0062 0.0008 1.0772 y 0.5836 
€=0.1522 
GPD* 2 = 0.098 0.0155 0.0055 12.788 2 0.0017 
6=0.31 


good fit to all data sets. Since the GPD has two parameters, we fit the data sets to the two-parameter DGLD 
(the scale parameter 8 = 1) and the three-parameter DGLD. Tables 6, 8 and 10 show the results of fitting 
these data sets to the two-parameter discrete gamma-Lomax, three-parameter discrete gamma-Lomax and the 
generalized Poisson distribution. The method of maximum likelihood is applied to estimate the parameters 
for the assumed discrete probability models. 
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Arnold and Emerson (2011) proposed a discrete analogue of Cramer-von Mises and Kolmogorov 
Smirnov goodness of fit statistics. In order to assess the goodness of fit for the fitted models, the discrete 
Kolmogorov-Smirnov (k-S), discrete Cramer-von Mises (W) and the Chi-square (y”) goodness of fit statistics 
for the fitted distributions are depicted in Tables 6, 8 and 10. 

Notice that (*) values in Tables 9 and 10 are different from the computed values in Consul (1989, 
p.120). From Tables 6, 8 and 10, it appears (based on y? p-value) that the GPD provides an adequate fit to 
the automobile manufacturing industry but does not provide an adequate fit to coal-mining and the transport 
industries. The two-parameter and three-parameter DGLD provide an adequate fit to all data sets. For all data 
sets, the values of discrete K-S and W statistics for DGLDs are smaller than the ones obtained from GPD. In 
addition, it can be observed that the DGL distribution fits the left and right tails of the three data sets well. 
These reaffirm the fact that the DGLD can provide an adequate fit to industrial strike data sets. Consequently, 
the DGLD can be used as a baseline distribution for modeling the number of strikes in industries. 


7. Conclusion 


In this paper, we have proposed and derived some distributional properties of a new discrete analogue of 
the continuous gamma-Lomax distribution (DGLD). From Figure 1, it appears that the DGLD offers great 
flexibilities in terms of shapes for the probability mass functions. The real data application section shows that 
DGLD can be useful in fitting various strike data sets and provides a better alternative to the existing GPD 
probability model. We have also discussed the estimation of the parameters in the standard situation (with all 
the data available), and also under the multi-censoring setup. An extension of the proposed DGLD to two or 
multi-dimensional setups can be a possible future research topic. 
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Chapter 12 


New Compounding Lifetime Distributions 
with Application to Hard Drive Reliability 


A Asgharzadeh,' Hassan S Bakouch,”? L Esmaeili' and S Nadarajah** 


1. Introduction 


1.1 Main problem 


The following are failure times in days of hard drives reported in https://www.backblaze.com/hard-drive- 
test-data.html: 
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-9690833 0.8882083 0.6025417 0.6353333 0.7610833 
-7613750 0.9068333 0.7614583 0.9068333 0.6356667 
- 9365833 0.7615417 0.9365417 0.7490000 0.7292500 
-7292083 0.7440000 0.7489583 0.7293333 0.7429583 


For computational stability, we have divided each observation by 1000. Some summary statistics of the data 
are: minimum = 0.147, first quartile = 0.748, median = 0.880, mean = 0.825, third quartile = 0.825, 
maximum = 0.990, inter quartile range = 0.197, range = 0.843, skewness = —1.914, kurtosis = 8.036, 
10th percentile = 0.646, 5th percentile = 0.595, Ist percentile = 0.153, 90th percentile = 0.969, 95th 
percentile = 0.989 and 99th percentile = 0.989. A histogram of the data is shown later in Figure 5. 

The problem studied in this paper is to find a simple yet accurate model for the distribution of these 
failure times. This is an important problem. An accurate modeling of failure times can lead to the production 
of more robust hard drives. We are not aware of any paper modeling failure times of hard drives — at least 
physically motivated models. 


Co a an > 


1.2 Proposed model 


We can suppose that a hard drive consists of components working in series (i.e., the hard drive will fail if 
and only if any one of its components fails) or that a hard drive consists of components working in parallel 
(i.e., the hard drive will fail if and only if all of its components fail). We can also suppose a mixture of series 
and parallel systems for the components of the hard drive, but that would complicate things. Of the two 
scenarios stated, the former is the more reasonable one. So, we shall suppose from now on that a hard drive 
consists of say NV independent components working in series, the assumption of independence is made for 
simplicity. The value of N may vary depending on the type of hard drive, manufacturer and other factors 
like length and weight. We can take N to follow a discrete uniform, geometric, binomial, negative binomial 
or a Poisson distribution. The simplest model for N is a discrete uniform distribution with probability mass 
function Pr(N = n) = i n= 1,2,...,k, where k = 1,2,... is an unknown parameter. For study of 
systems similar to that of a hard drive, we refer the readers to Kumar and Saini (2014), Saini and Kumar 
(2020) and Saini et al. (2020). 

Let X1, X2,..., Xw denote the failure times of the NV components. Then the failure time of hard drive 
is X = min (X1, X2,..., Xn). Suppose X 1, X2,... are independent and identical random variables with 
cumulative distribution function (cdf) and probability density function (pdf) specified by G (-; a) and g (-; a), 
respectively, where a are some unknown parameters. Suppose X,, X2,... are independent of N. Assume 
throughout that G'(-; aw) and g (-; a) are absolutely continuous functions. It is easy to show that the cdf of X 
is, 


G(x) [1- @@))"| 7 
F(a) =1 El-G@| =1-w(G(2)) (1) 
for z > O and k =1,2,..., where G(x) = 1 — G(x) and 
_ t(1-#*) 
wl) = Saag (2) 
which is also the probability generating function of NV. The pdf of X is, 
dw (G 7 
f(a) = g(2) 2 CO) _ gaya Gay), 


where, 
— du(t) 1—(k+1)t + kt*** 


oe k(t — 1) 


(3) 
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The pdf of X can be reexpressed as, 


1—(k+1)G"(x) + kee), 


F(x) = g(a) = 3 (4) 
’ k[G(e) — 1] 

The hazard rate function of X can be expressed in the forms, 

h(x) = ga)" (G(e)) J a log Jw (G(x 

(©) = Ge) 7g 08 le (@))] 
or 

ak k+1 
n= ne as ey (5) 


where h,(x) = g(x)/G(x). Hereafter, a random variable X with cdf given by (1) shall be denoted by 
X ~ GU(a, k). We shall suppose throughout that i is a positive real parameter, equivalent to approximating 
the discrete uniform random variable by a continuous uniform random variable. Approximating discrete 
random variables by continuous ones is a common practice. Also certain components of a hard drive not 
working to their full capacity can correspond to non-integer values of k. It is obvious that F(a) = G(x) 
when fk; = 1. 

Variates of GU(a, k) can be simulated as follows: if u is a variate of the uniform [0, 1] distribution then, 


g=G' (l—a"(w)) 
or the root of, 
k+l 


G" (a) — [1+ (1—u)k] G(x) + (1 -—u)k =0 


is a variate from GU(a, k). 
A simpler method for simulation when k; is an integer is as follows Ristic et al. (2007): simulate n from a 


discrete uniform distribution; simulate a random sample X,, X2,...,X,, independently from the distribution 
specified by the cdf G; then min (X1, X2,..., Xp) is a variate of GU(a, k). 
Distributions constructed by means of X = min (X,, X2,..., Xj) are known as compound distribu- 


tions of a minimum of a random number of random variables. Such distributions have received considerable 
attention in recent years. Twenty prominent distributions introduced recently are the exponential geometric 
distribution (Adamidis and Loukas 1998), the exponential Poisson distribution (Kus 2007), the exponential 
logarithmic distribution (Tahmasbi and Rezaei 2008), the Weibull geometric distribution (Barreto-Souza et 
al. 2011, Morais and Barreto-Souza 2011), the Weibull logarithmic distribution (Morais and Barreto-Souza 
2011), the Weibull negative binomial distribution (Morais and Barreto-Souza 2011), the exponentiated expo- 
nential binomial distribution (Bakouch et al. 2012), the exponential negative binomial distribution (Hajebi 
et al. 2013), the modified Weibull geometric distribution (Silva et al. 2013), the exponentiated exponential 
Poisson distribution (Ristic and Nadarajah 2014), the extended generalized gamma geometric distribution 
(Bortolini et al. 2017), the exponentiated inverse Weibull geometric distribution (Chung et al. 2017), the re- 
flected generalized Topp-Leone power series distribution (Condino and Domma 2017), Topp—Leone power 
series distributions (Roozegar and Nadarajah 2017), the power Lomax Poisson distribution (Hassan and 
Nassr 2018), compounded inverse Weibull distributions (Chakrabarty and Chowdhury 2019), an extended 
generalized half logistic distribution (Muhammad and Liu 2019), generalized linear exponential geomet- 
ric distributions (Okasha and Al-Shomrani 2019), an extended Poisson family of life distributions (Ramos 
et al. 2020) and the exponential-discrete Lindley distribution (Kemaloglu and Yilmaz 2020). Families of 
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distributions obtained by compounding a minimum of a random number of random variables include the 
Marshall-Olkin family of distributions (Marshall and Olkin 1997) and the generalized Marshall-Olkin fam- 
ily of distributions (Nadarajah et al. 2013). 

We shall show later that the GU distribution provides better fits than ten of these distributions for the 
hard drive data set. We shall also show that the GU distribution provides better fits than the gamma, Weibull, 
exponentiated exponential (originally due to Gupta and Kundu 1999) and exponentiated Rayleigh (originally 
due to Surles and Padgett 2001) distributions. 

We have motivated the GU distribution by the hard drive data set. It could have been motivated by other 
real examples too. Two such examples are: 


e Wu and Chang (2003) and Wu et al. (2007) considered a scheme where a random number of units is 
removed every time a failure occurs. This random number was supposed to follow a discrete uniform 
distribution. Let N denote the number of units removed and X 1, X2,..., Xj some characteristic (for 
example, age) of the units. 


e Chang and Huang (2014) supposed that job processing times in a reentrant flow shop follow a discrete 
uniform distribution. Let N denote the job processing time and X,, X2,..., Xw the costs of waiting 
while the job is processed. 


In both examples, a variable of interest is X = min (X), Xo,...,Xw). 
The calculations in this paper involve the gamma function defined by, 


T(a) = [ t?—! exp(—t)dt, 


and the beta function defined by, 


1 
B(a,b) =} (1 0) dk: 
0 


The properties of these special functions can be found in Gradshteyn and Ryzhik (2000). 


1.3 Purposes 
The purposes of this paper are to: 


*¢ derive various mathematical properties of X ~ GU(a, k), see Section 2. These include expansions for 
the pdf and the cdf, shape of the pdf, shape of the hazard rate function, moments, moment generating 
function, reversed residual life moments, order statistic properties, and the Rényi entropy. 


* estimate parameters of the GU distribution by the method of maximum likelihood and discuss the 
asymptotic properties of the estimators, see Section 3. 


e show that the GU distribution can be a good model for the failure times data, see Section 4. 


Some of the mathematical properties reported involve single and double infinite sums, see Section 2.6 
and the appendix. Numerical computations not reported here show that each of these infinite sums can 
be truncated at 20 to yield a relative error less than 10~?° for a wide range of parameter values and for a 
wide range of choices for g and G. This shows that the mathematical properties can be computed for most 
practical uses with their infinite sums truncated at twenty. The computations were performed using Maple. 
Maple took only a fraction of a second to compute the truncated versions. The computational times for the 
truncated versions were significantly smaller than those for the untruncated versions. 
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2. Mathematical properties 


2.1 Expansions 


Some of the mathematical properties of the GU distribution cannot be expressed in closed form. In these 
cases, it is useful to have expansions for the pdf and the cdf. As mentioned in Section 1, the truncated 
versions of these expansions can be of practical use. 

One can demonstrate that the cdf in (1) and the pdf in (4) can be expressed as 


1 k 
F(a) = 7 >) Qi(2) (6) 
and 


; 
fe) =; ue), (7) 


a—1 


respectively, where Q,(x) = 1 — G" (x) and qa(x) = ag(x)G" (x). Note that qq and Q, are the pdf and 
cdf of the exponentiated-G distribution. Structural properties of several exponentiated-G' distributions have 
been studied by many authors: see Marshall and Olkin (2007) for an excellent account. 

It follows from (6) that (1) can be expressed as a finite linear combination of cdfs of exponentiated- 
G distributions. It follows from (7) that (4) can be expressed as a finite linear combination of pdfs of 
exponentiated-G distributions. Hence, any mathematical property of the GU distribution can be expressed as 
a finite linear combination based on exponentiated-G distributions. 


2.2 Shape 


In this section, we investigate the shapes of (1), (4) and (5). Shape properties are important because they 
allow the practitioner to see if the distribution can be fitted to a given data set (this can be seen by comparing 
the shape of the histogram of the data with possible shapes of the pdf). Shape properties are also useful to 
see if the distribution can model increasing failure rates, decreasing failure rates or bathtub shaped failure 
rates. 

The shapes of the pdf, (4), and the hazard rate function, (5), can be described analytically. The critical 
points of the pdf are the roots of the equation: 


g (x)us' (G(x) = g(x) g(a)w" (G(2)) . (8) 


There may be more than one root to (8). If 2 = xo is a root of (8) then it corresponds to a local maximum, a 
local minimum or a point of inflexion depending on whether A (xo) < 0, A (ao) > 0 or A (ao) = 0, where, 


A(a) = g°(x)w"" (G(x) — 29(x)g! (w)w" (G(x)) + 9" (a)! (G(x) — (g'(a))’ w" (GQ). 


The critical points of the hazard rate function, (5), are the roots of the equation: 


+9'(2)— (9) 
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There may be more than one root to (9). If 2 = 29 is a root of (9) then it corresponds to a local maximum, a 
local minimum or a point of inflexion depending on whether A (xo) < 0, A (ao) > 0 or A (ao) = 0, where, 


d(x) = g?(x) ww!" (G(z)) 293 (a) w" 2) Ww" (G(2)) 2q(a)g! x) ws" (G(z)) 


w? (G(a)) w3 (G(a)) w? (G(a)) 
w! (Gla) w” (Gla) w! (Ga)) }* w! (Ga) |" 
29° (2) —— 5 (Gam) + 29° 9 |e (@)) + 39(2)9'(2) (Ge) 
= seh) ED + ge ID 


Here, w(-) and w’(-) are given by (2) and (3), respectively. The second and third order derivatives are, 


tht? — tee? — tht 4 th-1 hk? 4 th-1k + atk — 2 


" 
w(t) = 
3 
k(t—1) 
and 
i —th+143 4 3th 43 — 3¢h—-143 — aeh tly 4 ath tle? 4 eh-243 — 3t* he — oth? + 6t*—1k 4 3¢h—1 Ke? — th- See 6 
b= 
BOND k(t —1)4 
respectively. 


The asymptotes of (1), (4) and (5) as 2 — 0, 00 are, 


f(x) eee) (x) asx 0, 

f(x) ae) asx oO, 

F(x)~1 as £—> 00, 
1-G"(z) 

OO Tae) asx — 0, 

h(a) ~ ES) a asx — 0, 


2.3 Special cases 


For G(x) = 1—e~*", G(@) = 1- (1+2°)~*, G(a) = ec” and G(x) = (1+2-)~”, the GU 
distribution reduces to the Weibull uniform (WU), Burr uniform (BU), inverse Weibull uniform (TWU) and 
inverse Burr uniform (IBU) distributions, respectively. Table | lists the probability density and survival 
functions of these distributions. Table 2 lists the corresponding hazard rate functions. 

We have chosen the special cases to correspond to the Weibull and Burr distributions because: i) Weibull 
distribution is the most popular model for lifetime data; ii) Burr distribution is one of the most versatile 
distributions in statistics. As shown by Rodriguez (1977) and Tadikamalla (1980), the Burr distribution 
contains the shape characteristics of the normal, lognormal, gamma, logistic and exponential distributions as 
well as a significant portion of the Pearson type I, H, V, VU, [X and XII families. 

Figures | and 2 plot the pdfs of the WU, IWU, BU and IBU distributions. Figures 3 and 4 plot the 
hazard rate functions of the WU, IWU, BU and IBU distributions. We see that monotonically decreasing and 
unimodal shapes are possible for the pdfs. Monotonically decreasing, monotonically increasing, unimodal 
and upside down bathtub shapes are possible for the hazard rate functions. We have chosen k = 5 in Figures 
1 to 4. Other values of k did not exhibit different shapes. 
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Table 1: Probability density and survival functions of some special GU distributions. 
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Table 2: Hazard rate functions of some special GU distributions. 
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2.4 Moments and moment generating function 


Moments properties are fundamental for any distribution. For example, the first four moments can be used 
to describe any data fairly well. Moments are also useful for estimation. 


We derive two representations for the nth moment of X ~ GU(a,k). The first is immediate from (7): 
nm 1 - n 
E(X")=—) B(Y), (10) 


where Y, is a random variable with the cdf and pdf specified by Qa(x) = 1— G"(a) and qa(x) = 
ag(x)G" (2), respectively. If N is a discrete uniform random variable and X,, Xo,... are independent 
random variables distributed according to the cdf G' and are independent of N then, 


k 
E(X")= = oF (maim (Xy, Xo,004Xy)1"): 


= 


The ordinary moments of several GU distributions can be calculated directly from (10), see Table 3. 
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Figure 1: Plots of the probability density functions of the WU and IWU distributions. 


Table 3: Moments of some special GU distributions. 
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We derive two representations for the moment generating function M(t) = E[exp(tX)] of X ~ 


GU(a, k). The first one is, 


where Li =F (X j ) is given by (10). The other one comes from (7): 


k 
ss M; (t), 


where /,,(t) denotes the moment generating function of Y,,. So, the moment generating function of several 
GU distributions can be determined from the moment generating function of Yq. 

Moments of residual lifetime and reversed residual lifetime random variables are extensively used in 
actuarial sciences in the analysis of risks. Given survival to time t, the residual life is the period from time t 


M(t) = 


cola 
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Figure 3: Plots of the hazard rate functions of the WU and IWU distributions. 


until the time of failure. The rth moment of residual lifetime and reversed residual lifetime random variables 
for X ~ GU(a, k) can be expressed as, 


p(t) = B[(X-t)"|X>¢ 


and 


m(t) = El(X-t"|X <4 


1 ft : 
= gq [@-V wd 
~ Fi 2 (") (=1) |B) = Le, & 4,0), 


7 
i=0 


respectively, where L(---) is defined in the appendix. 
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Figure 4: Plots of the hazard rate functions of the BU and IBU distributions. 


2.5 Order statistics 


Order statistics have been used in a wide range of problems, including robust statistical estimation, detec- 
tion of outliers, estimation using L moments, characterization of probability distributions, goodness-of-fit 
tests, entropy estimation, analysis of censored samples, reliability analysis, quality control and strength of 
materials. 


Suppose Xj, Xog,... 
statistic when X1, X9,.. 


, X» iS arandom sample from the GU distribution. Let X;., denote the ith order 
., X,, are arranged in the increasing order. The pdf of X;.,, can be expressed as, 


; f(z) ("5 ') _1\5 FHI-1l(y 
fon(z) Ban-itt) 244 (—1) (x) 
1 SS fa p33 
= —1\)k7*-I 
ain j \ 1) 
k k k k 
9p(#)Q,, (2)Q,, (2) rica (8) 
p=lrj=l1re=1 Titg-1=1 
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The sth moment of X;., can be expressed as, 
1 ay ee ae 
E(X; = so —1yk "4 
() = gorse ("7 JOH 


k k k 
os 3 [a2 Qe (2) Ql) nga): 


p=lrj=lr Ti4j-1=1 


The moment generating function of X;., can be expressed as 


1 n-i n—4 iets 
Min(t) = Tccereey | j Jen’ 
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k k k ee) 
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2.6 Rényi entropy 


The entropy of a random variable, X, a measure of its uncertainty, has many applications in various fields 
of science and engineering. The most popular entropy measure is the Rényi entropy (Rényi 1961). For a 
random variable X with pdf, f(a), it is defined as 


: log (/ (ade) 
=e) R 
for y > Oandy £1. 


Recent applications of the Rényi entropy include: ultrasonic molecular imaging (Hughes et al. 2009); 
molecular imaging of tumors using a clinically relevant protocol (Marsh et al. 2010); sparse kernel density 
estimation and its application in variable selection (Han et al. 2011). 

For X ~ GU(a,k), 


[ fl(a)dx 


IrR(y) = 


ll 
3 
Me: 
fo oS 
So 
SS 
3 w 


ete i G(2)™-27G(a)*4 g(x) da 


j=0m=0 J 
- > (7) (2) aieac.am) 


where, 


I(y,j,m) = | Timea @ _ uy’ gt} (G-*(u)) du. 


Therefore, the Rényi entropy for X ~ GU(a, k) can be expressed as 


5 log se ss (; se —1)Pk™-71(y,5,m) 


j=0 m=0 


inl) = 


Rényi entropy for the WU, IWU, BU and IBU distributions are given in the appendix. 
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3. Estimation 


Suppose x1,...,2,, is arandom sample from the GU distribution with unknown parameters O = (a, k)', a 
r x 1 parameter vector say. We determine the maximum likelihood estimators of the parameters. 
The log-likelihood function of © is, 


é,(®) = —nlogk + Slog g (ai) — 25° log G (#;) + 5 log [1 — kG (x;) G* (a;) -G* («| 1) 
i=1 i=1 i=1 
The log-likelihood function can be maximized either directly by using SAS (PROC NLMIXED) or the Ox 
program (subroutine MaxBFGS) (see Doornik 2007) or by solving the nonlinear normal equations obtained 
by differentiating (11). The latter are, 


a Og (xi) dG (z;) 
w= ox Oa 
da ema LY Ge 
n OG (24) Ak-1 2 a. 
> Ae G _ [k?G (xi) +kG (ai) +k] | 
= 1—G" (a4) [kG (2i) +1] 


Ln, nm SG (ai) + log G (a;) [kG (a;) + 1] 
ok ra kG (xi) +1—-G@"(x;) 


In the practical data application presented in Section 4, the SAS procedure was used to obtain the maximum 
likelihood estimates. Numerical computations not reported here showed that the surface of (11) was smooth 
for given smooth functions g(-) and G(-). The SAS procedure was able to locate the maximum of the 
likelihood surface for a wide range of smooth functions and for a wide range of starting values, including 


starting values determined by the method of moments, i.e., the simultaneous solutions of F (Xx 4 =— S x 
n 
j=l 
for? = 1,2,...,r, where the left hand side is given by (10). These equations were solved numerically using 
SOLVE in SAS as (10) is not in closed form. The solutions for the maximum likelihood and estimates of 
moments were unique for all starting values. 


ae ~\T 
Let 0 = (a, k;) denote the maximum likelihood estimator of O = (a, k). Under certain regular- 


ity conditions (see, for example, Ferguson (1996) and Lehmann and Casella (1998), pages 461-463), the 
distribution of © as n — oo is the r-variate normal with mean © and covariance given by the inverse of, 


Ons ga 
l= Thi Tie \_ a (- Oa? ) (- ae 
7 Tn Ig9 7 E (- Obs ) E (- oe) 


OkOa Ok? 


Here, I is the expected information matrix. 

In practice, n is finite. The recommended approximation (see, for example, Cox and Hinkley 1979) for 
the distribution of © is the r-variate normal distribution with mean © and covariance taken to be the inverse 
of 


O7b O07 Ly, 

—f( Ju Sw \_ ar? Badk 

oc ( Joi Joe ) a Bt, Oe, 
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Here, J is the observed information matrix. This is known to be a better approximation than the one based 
on the expected information matrix, see Cox and Hinkley (1979). 


4. Practical data analysis 


In this section, we return to the hard drive data set discussed in Section 1. As explained there, the failure time 
of a hard drive can be modeled as the failure of a system having components working in series. We fitted the 
following distributions to the data: the two-parameter gamma distribution specified by the pdf, 


beget 1e—be 


for x > 0, a > O and b > 0; the two-parameter Weibull distribution specified by the pdf, 
f(x) = abr?te-** 


for x > 0, a > Oand b > O; the two-parameter exponentiated exponential (Gupta and Kundu 1999) 
distribution specified by the pdf, 


f(z) = abe~™* [1 — en ba] 


for x > 0,a > Oand b > 0; the two-parameter exponentiated Rayleigh (Surles and Padgett 2001) distribution 
specified by the pdf, 


g,a-1 
f(x) = 2ab2xeW?™ [1 - all 


for x > 0, a > O and b > 0; the three-parameter WU distribution; the three-parameter IWU distribution; 
the four-parameter BU distribution; the four-parameter IBU distribution; the two-parameter exponential ge- 
ometric (Adamidis and Loukas 1998) distribution specified by the pdf, 


b(1 — p)e* 
mi 2 


f(x) = 
[1 — pe 


for x > 0, b > 0 and 0 < p < 1; the two-parameter exponential Poisson distribution (Kus 2007) specified 
by the pdf, 


N\bew > bet Ae" 
1-—e 
for « > 0, b > O and A > 0; the two-parameter exponential logarithmic distribution (Tahmasbi and Rezaei 
2008) specified by the pdf, 
b(1 — p)e—* 
f(x) = =e 
(—logp) [1 — (1 — p)e™™”] 


for z > 0, b > 0 and 0 < p < 1; the three-parameter Weibull geometric (Barreto-Souza et al. 2011) 
distribution specified by the pdf, 


ab@(1 _ p)a?— te (oz)* 


a [i - pe= 2") 
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forz > 0,a > 0, b > O and 0 < p < 1; the three-parameter Weibull logarithmic distribution (Morais and 
Barreto-Souza 2011) specified by the pdf, 


f(z) = ab*(1 — p)z* te)" 
(— log p) [1 -(1- pew 2)"] 


forxz > 0,a > 0,6 > Oand 0 < p < 1; the four-parameter Weibull negative binomial distribution (Morais 
and Barreto-Souza 2011) specified by the pdf, 


forx > 0,a > 0,b > 0,0 < p < lands > 0; the four-parameter exponentiated exponential binomial 
distribution (Bakouch et al. 2012) specified by the pdf, 


abspe~°* (1 = e-bayo [1 —p (i = et)" | ae 


f(x) = Pe(i— oy 


forz > 0,a > 0,6 > 0,0 < p < lands > 0; the three-parameter exponential negative binomial 
distribution (Hajebi et al. 2013) specified by the pdf, 


kb(1 — p)*e— "2 


oy = re] k+1 


[1 — pe 


forxz > 0,b > 0,0 < p< 1andk > 0; the four-parameter modified Weibull geometric (Silva et al. 2013) 
distribution specified by the pdf, 


be(1 ! p)x*"(a 4 Ax)er2— (b2)*e%* 
[1 = pe e)*e*] ; 


f(z) = 
forz > 0,a > 0,b > 0, A > O and 0 < p < 1; the three-parameter exponentiated exponential Poisson 
distribution (Ristic and Nadarajah 2014) specified by the pdf, 


abX\e7o* (1 & e-bay o e-d(1-e-*) 


1-2 


f(z) = 


forz > 0,a > 0,b > Oand A > 0. In total, eighteen distributions were fitted to the hard drive data set. 
Four of these distributions have two parameters each. Six of these distributions have three parameters each. 
The remaining five distributions have four parameters each. The distributions were fitted by the methods of 
maximum likelihood and moments. The WU, IWU, BU and IBU distributions were fitted by following the 
procedures in Section 3. 

Many of the fitted distributions are not tested. Discrimination among them was performed using various 
criteria: 


e the Akaike information criterion due to Akaike (1974) defined by, 
AIC = 2q — 2log L (6) , 


where @ is the vector of unknown parameters, © is the maximum likelihood estimate of © and q is 
the number of unknown parameters; 
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the Bayesian information criterion due to Schwarz (1978) defined by, 


BIC = qlogn — 2 log L (6) ; 
the consistent Akaike information criterion (CAIC) due to Bozdogan (1987) defined by, 
CAIC = —2log L (6) + q(logn +1); 


the corrected Akaike information criterion (AICc) due to Hurvich and Tsai (1989) defined by, 


AICe = arc + 229+). 


n—q-1’ 
the Hannan-Quinn criterion due to Hannan and Quinn (1979) defined by, 
HQC = —2log L (6) + 2g log log n; 


the p-value of the Kolmogorov-Smirnov statistic (Kolmogorov 1933, Smirnov 1948) defined by, 


’ 


sup 
x 


~ Sot iti <x} — F(z) 


where, I {-} denotes the indicator function and F(-) the maximum likelihood estimate of F(x); 


the p-value of the Kolmogorov-Smirnov statistic (Kolmogorov 1933, Smirnov 1948) is defined by, 


sup 
x 


? 


~ Soni < «} — F(z) 


where, F'(-) is the method of moments estimate of F(x); 
the p-value of the Anderson-Darling statistic (Anderson and Darling 1954) is defined by, 
—n— a {log F (xy) + log [1 —F (e110) | } F 
i=1 
where, La) LL) S++ L Lm) is the observed data arranged in increasing order; 


the p-value of the Anderson-Darling statistic (Anderson and Darling 1954) is defined by, 


n 


—n— > {log F (xy) + log [1 ay (241-0) | } ; 


i=l 


the p-value of the Cramér-von Mises statistic (Cramér 1928, von Mises 1931) is defined by, 


n 


1 Hol « 
s| on - F(ew)| 
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e the p-value of the Cramér-von Mises statistic (Cramér 1928, von Mises 1931) defined by, 


The smaller the values of AIC, BIC, CAIC, AIlCc, and HQC the better the fit. For more discussion on these 
criteria, see Burnham and Anderson (2004) and Fang (2011). 

Since the Kolmogorov-Smirnov, Anderson-Darling and Cramér-von Mises tests assume that the fitted 
distribution gives the “true” parameter values, the p-values were computed by simulation as follows: 


(i) fit the distribution to the data and compute the corresponding Kolmogorov-Smirnov/Anderson-Darling/ 
Cramér-von Mises statistic; 


(ii) generate 10000 samples each of the same size as the data from the fitted model in step (i); 
(iii) refit the model to each of the 10000 samples; 


(iv) compute the Kolmogorov-Smirnov/Anderson-Darling/Cramér-von Mises statistic for the 10000 fits in 
step (ili); 


(v) construct an empirical cdf of the 10000 values of the Kolmogorov-Smirnov/Anderson-Darling/Cramér- 
von Mises statistic obtained in step (iv); 


(vi) compare the Kolmogorov-Smirnov/Anderson-Darling/Cramér-von Mises statistic obtained in step (i) 
with the empirical cdf in step (v) to get the p-value. 


The values of — log ZL, AIC, BIC, CAIC, AICc and HQC for the eighteen fitted distributions are given 
in Table 4. The p-values of the Kolmogorov-Smirnov, Anderson-Darling and Cramér-von Mises statistics 
for the eighteen fitted distributions are given in Table 5. In these tables, EE, ER, EG, EP, EL, WG, WL, 
WNB, EEB, ENB, MWG and EEP denote the exponentiated exponential, exponentiated Rayleigh, expo- 
nential geometric, exponential Poisson, exponential logarithmic, Weibull geometric, Weibull logarithmic, 
Weibull negative binomial, exponentiated exponential binomial, exponential negative binomial, modified 
Weibull geometric and the exponentiated exponential Poisson distributions, respectively. 

We can see that the smallest AIC, the smallest BIC, the smallest CAIC, the smallest AICc, the smallest 
HQC and the largest p-values are for the WU distribution. The second smallest AIC, the second smallest 
BIC, the second smallest CAIC, the second smallest AICc, the second smallest HQC and the second largest 
p-values are for the Weibull distribution. The largest AIC, the largest BIC, the largest CAIC, the largest 
AICc, the largest HQC and the smallest p-values are for the ENB distribution. 

There is not much difference between the p-values obtained by the methods of maximum likelihood and 
moments. The relative performances of the eighteen distributions with the respect to the p-values appear the 
same for both methods. At the five percent level, all of the fitted distributions appear acceptable except for 
the EP and ENB distributions. However, the best fitting distribution in terms of the twelve criteria is the WU 
distribution. It is pleasing that the WU distribution gives the best fit in spite of having one parameter less 
than the five distributions each having four parameters. 


The parameter estimates for the best fitting WU distribution are k = 0.077(0.016), A~1/8 = 0.820(0.009) 
and B = 6.881(0.092). The standard errors given in brackets were computed by inverting the observed infor- 
mation matrix, see Section 3. The parameter estimates imply that failure time of a hard drive can be modeled 
as the failure time of a discrete uniform number of components working in series, the average number being 
0.538. The average number being less than one means that hard drives do not work to their full capacity 
on average. The failure time of each component has a Weibull distribution with mean equal to 0.767 and 
variance equal to 0.017. 
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Table 4: Log-likelihood values and information criteria for the distributions fitted to the hard drive failure data. 


Distribution — log L AIC BIC CAIC AICc HQC 
Gamma -18.430 = -32.860 = -27.650 = -25.650 ~—-32.736 ~—-30.751 
Weibull -50.759 = -97.517 = -92.307 ~—- -90.307 -97.393 -95.408 
EE -0.289 3.422 8.632 10.632 3.545 5.530 
ER -21.323 -38.646 -33.435 = -31.435 -38.522 — -36.537 
EEP 21.321 48.641 56.457 59.457 48.891 51.805 
EP 81.252 166.503, 171.714 173.714 ~~: 166.627 ~—s: 168.612 
EG 82.333 168.666 173.876 175.876 168.790 ~—-170.775 
ENB 82.333 170.666 178.481 181.481 170.916 173.829 
EL 11.975 27.950 33.160 35.160 28.074 30.059 
WG -8.919 -11.839 -4.023 -1.023 -11.589 -8.675 
WNB -8.919 -9.839 0.582 4.582 -9.418 -5.621 
WL -3.686 -1.373 6.443 9.443 -1.123 1.790 
EEB 3.939 15.878 26.298 30.298 16.299 20.095 
MWG -8.919 -9.839 0.582 4.582 -9.418 -5.621 
WU -53.893  -101.787 = -93.971 = -90.971 = -101.537. — -98.624 
IWU 49.310 104.619 112.435 115.435  =104.869 107.783 
BU -44.827 -81.654 = -71.234 = -67.234 ~—-81.233 -77.437 
IBU -12.226 -16.453 -6.032 -2.032 -16.032 = -12.235 


Table 5: p-values of goodness of fit statistics for the distributions fitted to the hard drive failure data. 


KS p-value AD p-value CV p-value 
Distribution MLE MME MLE MME MLE MME 
Gamma 0.196 0.233 0.269 0.199 0.196 0.173 
Weibull 0.267 = =0.256 =0.276 = 0.249. (0.294 (0.236 
EE 0.136 0.102 0.066 0.108 0.094 0.108 
ER 0.211 0.246 0.274 0.243 0.232 ~=—0.208 
EEP 0.105 0.061 0.046 0.036 0.074 0.060 
EP 0.064 0.058 0.027 0.024 0.053 0.026 
EG 0.051 0.039 0.026 0.018 0.041 0.025 
ENB 0.044 0.012 0.006 0.011 0.017 0.019 
EL 0.108 0.068 0.054 0.042 0.075 0.060 
WG 0.147 0.183 0.083 0.174 0.143 0.149 
WNB 0.140 0.114 0.071 0.125 O.113 0.124 
WL 0.147 0.183 0.083 0.174 =0.143 0.149 
EEB 0.127. 0.086 0.054 0.066 0.080 0.077 
MWG 0.146 0.158 0.077 0.148 0.123 0.136 
WU 0.289 0.279 0.278 ~=0.296 =—0.296 ~—(0..265 
IWU 0.072 0.059 0.040 0.027 0.057 0.053 
BU 0.260 0.248 0.275 0.245 0.259 0.214 
IBU 0.163 0.224 0.268 0.198 0.191 0.173 


The probability-probability, quantile-quantile and density plots for the best fitting WU distribution are 
shown in Figures 5, 6 and 7. We see that its fit is reasonable except possibly in the lower tail. A future work 
is required to find distributions providing better fits to the lower tail. 
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Figure 5: Fitted WU pdf and the histogram for the hard drive failure data. 
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Figure 6: Probability plot for the fitted WU distribution for the hard drive failure data. 


5. Conclusions 


Motivated by a failure times data set of hard drives, we have proposed a class of distributions. We have 
studied various mathematical properties of the distributions, derived estimators by the method of maximum 
likelihood and discussed the asymptotic properties of the estimators. In particular, we have shown that the 
hazard rate of the failure time of a hard drive can be monotonically decreasing, monotonically increasing, 
unimodal or upside down bathtub shaped. 
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Figure 7: Quantile plot for the fitted WU distribution for the hard drive failure data. 


We have shown that the proposed distributions fit the failure times data well. They provide better fits than 
at least fourteen other known distributions, including the gamma, Weibull, exponentiated exponential and 
exponentiated Rayleigh distributions. None of these distributions have been previously used (neither have 
they been physically motivated) to model failure time of a hard drive. 


Reliability 
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Figure 8: Reliability of the hard drive. 


The reliability associated with the failure time of a hard drive is shown in Figure 8. As expected, 
the reliability is a decreasing function of the failure time. For example, the probabilities that the hard 
drive will continue to operate without failures after 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100 days are 
0.9999995, 0.9999393, 0.9990117, 0.9928671, 0.9673065, 0.8899872, 0.7141655, 0.4300986, 0.1499379 
and 0.01988396, respectively. The probability that the hard drive will continue to operate without failures af- 
ter 1000 days is almost zero. In particular, the probability that the hard drive will continue to operate without 
failures after 1300 days is less than 107 1°. 
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Appendix 


The function L in Section 2.4 can be expressed as, 


L(a,k,c,t) = [oe t@ar=¥ 


j=0 


Lag + lye; c) (f+ 1)Ia(g +k + 2,0, €) 
+ ; 
k jtk+2 


G+1(kK+)IhngG+k+1,a,c) 
kj+k+1) , 


where, 
1 


Ly (a,a,c,t) = [ axz°q(x)G(x)da = of. [G-1(u)]" (1 —u)* "du. 


Rényi entropy in Section 2.6 for the WU, IWU, BU and IBU distributions can be expressed as, 
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respectively. 
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Chapter 13 


Comparing the Performance of G-family 
Probability Distribution for Modeling 
Rainfall Data 


Mad Mostafizur Rahman,* Md Abdul Khalek and M Sayedur Rahman 


1. Introduction 


Rain is an important natural resource which occurs from the interaction between several complex atmospheric 
processes. The quantity of rainfall confirms the amount of water availability of a particular area which is 
essential for agriculture production, industrial development and other human activities. All kinds of plants need 
atleast some amount of water to survive. Sufficient rain is a blessing for agriculture but it becomes dangerous 
if its scarce or in excess. So, the distribution of rainfall in time and space is necessary for the development 
of a particular economy. Statistical probability distributions are a proven tool to describe many natural and 
social problems by providing suitable models and methods. Simple summary statistics give some idea about 
the rainfall status but prior knowledge about rainfall is enhanced by different statistical distributions. Maliva 
and Missimer (2012) showed that Normal, Gamma, Gumbel and Weibull probability distributions gave better 
results for fitting rainfall data from arid and semi-arid regions. Sen and Eljadid (1999) investigated the 
performance of statistical distributions in the case of monthly Libyan rainfall data over 20 years and found 
the Gamma distribution provided the best performing results. Al-Mansory (2005) compared the performance 
of different statistical distributions such as Normal, Log-Normal, Log-Normal type II, Pearson type IIL, 
Log-Person type III, and Gumbel for maximum monthly rainfall data of Basrah station, Iraq and found that 
Person type HI and Gumbel distributions performed better than other distributions. Olumide et al. (2013) 
fitted Gumbel, log Gumbel, Normal and Log-Normal probability distribution models to various rainfalls 
and runoffs for the Tagwai dam site in Minna, Nigeria and found Normal and log-Normal distributions 
were most appropriate for the prediction of yearly maximum daily-rainfall and yearly maximum daily-runoff 
respectively for this study area. Alghazali and Alawadi (2014) fitted three statistical distributions Normal, 
Gamma and Weibull in the case of thirteen Iraqi weather stations and found that the Gamma distribution 
was suitable for fitting five stations by the Chi-square test where Normal and Weibull distributions were not 
appropriate and the Kolmogorov-Smirnov test indicated that none of these three distributions were suitable 
for either of these five stations. 

Although classical statistical distributions show better fitting results in many areas of real world 
situations, they have some restrictions and limitations, which led researchers to build new distributions which 
are more flexible and can overcome them. This new distribution is well known as a member of the G-family 
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distributions which are defined by adding one or more parameters to the cumulative distribution function of 
a classical statistical distribution. Several authors proposed the G-family distribution and found that in most 
of the cases these extended distribution perform better than classical distributions. For example, Marshall and 
Olkin (1997) proposed the Marshall-Olkin G family distribution, Kumaraswamy-G family was proposed by 
Cordeiro and Castro (2011), Alexander et al. (2012) proposed McDonald G family distribution, Bourguignon 
et al. (2014) proposed the Weibull-G family distribution, the exponential half-logistic G family distribution 
was proposed by Cordeiro et al. (2013), Tahir et al. (2016) proposed the Logistic-X G family distribution, 
Kumaraswamy Marshal-Olkin G family was proposed by Alizadeh et al. (2015), Generalized Transmuted 
G family was proposed by Nofal et al. (2017), Exponentiated Transmuted-G family was proposed by 
Merovcia et al. (2017) and Yousof et al. (2018) proposed the Marshall-Olkin generalized-G family distribution 
and so on. 

From the above discussion it is clear that the Normal, log-Normal, Gamma, Gumbel and Weibull 
distributions are well suited for fitting rainfall data. Recently researchers have been trying to find distributions 
performing better by adding extra parameters to the existing classical distribution. This family of distributions 
is also known as G-family distributions which perform better in fitting and predicting a variety real data. 
The application of the G-family distribution in environmental science especially to rainfall data is rare and 
quite interesting. So, the aim of this paper is to find the best performing distribution from a set of different 
G-family distributions such as Gamma uniform G family, Kumaraswamy G family, Marshall-Olkin G family 
and Weibull G family distribution in case of rainfall data of the Rajshahi division, Bangladesh. 


2. G-family distribution 


Rainfall data analysis depends on different distribution patterns . It is always interesting for researchers to find 
the best performing distribution for modeling rainfall data of certain areas. In this methodology we present 
mathematical forms of different G-family distributions and also present some measures of goodness of fit 
tests. Teimouri and Nadarajah (2019) developed a Maximum Product Spacing (MPS) package for computing 
probability density functions, cumulative distribution functions, parameter estimation and drawing q-q plots 
from different G-family distributions. In our study we used Teimouri and Nadarajah’s (2019) MPS package 
for data analysis. 


2.1 Gamma uniform G distribution 


The general form of the probability density function of the Gamma Uniform G distribution proposed by 
Torabi and Montazeri (2012) is given by, 


h(x— 1,0) H(x—p,0) J) en( H(x—p,0) 
T(a)(—-H(x- u, 0)” \1-H(x- 4,0) 1-H (x- 0) 


f(x, 0) = 


where, @ is the baseline family parameter vector, a > 0 and w are extra parameters induced to the baseline 
cumulative distribution function H whose probability density function is 4. The general form for the 
cumulative distribution function of this distribution can be written as: 
H(x-u.0) 94 
F(x Q) = | 1-H (x—y,0) Jy exp(—y) 
I'(a) 
The baseline H refers to the cumulative distribution function of different families such as Chen, Frechet, 


Log-Normal and Weibull distributions. The parameter vector is © = (a,0,u) where 6 denotes the baseline G 
family parameters which contain the shape and scale parameters. The parameter jv is the location parameter. 
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2.2 Kumaraswamy G distribution 


Cordeiro and Castro (2011) proposed Kumaraswamy G distribution. The probability density function of 
Kumaraswamy G distribution is given below: 


Ax,O) = aBh(x — 1,0) (A(x — 4,0))"" [1 — (A — 0)" 


where, @ is the baseline family parameter vector a > 0, 6 > 0 and w are extra parameters induced to the 
baseline cumulative distribution function H whose probability density function is . The general form for the 
cumulative distribution function of the Kumaraswamy G distribution is given by, 


F(x,0) = 1 — [1- (AG - 4,8))"¥ 


The baseline H refers to the cumulative distribution function of different families such as Chen, Frechet, 
Log-normal and Weibull distributions. The parameter vector is © = (a,8,0,u) where @ denotes the baseline 
G family parameters which contain the shape and scale parameters. In this model a and f are the first and 
second scale parameters respectively and y is the location parameter. 


2.3 Marshall-Olkin G distribution 


Marshall and Olkin (1997) proposed the one G family distribution which is known as the Marshall and Olkin 
G distribution. The probability density function of this distribution is given below: 
ah(x-u,0 
f(x,0)= nw) 
[I-d-a)\l-A(x- u,6))] 
where, @ is the baseline family parameter vector. a > 0 and w are extra parameters induced to the baseline 
cumulative distribution function H whose probability density function is 4. The cumulative distribution 
function of the Marshall-Olkin G distribution is given by, 
a(1— G(x ~ 4,4) 
[I--a)l- Ger 4,9) 
The baseline H refers to the cumulative distribution function of different families such as Chen, Frechet, 


Log-normal and Weibull distributions. The parameter vector is © = (a,8,0,u) where @ denotes the baseline G 
family parameters which contain the shape and scale parameters and yu is the location parameter. 


F(x,®)=1 


2.4 Weibull G distribution 


The Weibull G distribution was proposed by Alzaatreh et al. (2013). The general form for the probability 
density function of this distribution can be written as: 


a _H(x-1,0) ee ah 
Be 1 H(x— 1,9) B 
where, @ is the baseline family parameter vector, a > 0, 6 > 0 and uw are extra parameters induced to the 
baseline cumulative distribution function H whose probability density function is h. The general form for the 
cumulative distribution function of the Weibull G distribution is given by, 
—log(1- A (x- ‘ 
F(,0)=1-e%0 og(|- A(x uD) 
B 

The Weibull G distribution is the special case of Alzaatreh et al.’s (2013), Weibull-X family distribution 

. The baseline H refers to the cumulative distribution functions of different families such as Chen, Frechet, 


Log-normal and Weibull distributions. The parameter vector is © = (a,£,0,u) where @ denotes the baseline G 
family parameters which contain the shape and scale parameters and yu is the location parameter. 


f(x,0) = 


[-log(-H(x- ,0))|"" ex 
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2.5 Model evaluation statistics 


Selecting the appropriate model is a challenging task for researchers. Researchers confirm that the following 
test statistics are well known for checking goodness-of-fit tests. These are Akaike Information Criterion 
(AIC), Bayesian Information Criterion (BIC), Anderson Darling (AD) statistic, Log-Likelihood (LL) statistic. 
The Kolmogorov-Smirnov (KS) test statistic and corresponding p-values. Smirnov (1948) introduced the 
table for estimating the goodness of fit for different empirical distributions. 


2.5.1 Akaike Information Criterion (AIC) 


The AIC (Akaike Information Criterion) is a method for scoring and selecting a model. It may be shown 
based on information theory and frequentist inference (The AIC statistic is defined as follows: 


woof 


where, N is the sample size, LL is the log-likelihood value of the model and k is the number of parameters. 
In case of model selection the lowest value of AIC gives a better result (Brownlee, 2019). 


2.5.2 Bayesian Information Criterion (BIC) 


The BIC (Bayesian Information Criterion) was derived from the Bayesian probability and inference technique 
and it is appropriate for fitting models by the maximum likelihood estimation framework. The estimation 
formula for the BIC statistic is given below: 


BIC =-2LL + In(N)k 


In this equation LL represents the log-likelihood value of the model, NV and k are the sample size and 
number of parameters respectively and represents the natural logarithm with base e. The minimum value of 
BIC confirms a better model (Brownlee 2019). 


2.5.3 Log-likelihood (LL) 


The likelihood method is a measure which confirms how well a parameter explains the observed data. The 
logarithm of the likelihood function in maximum likelihood estimation is computationally a very simple 
method (Robinson 2016). Let x), x9,..., x, be independently identically distributed random variables with 
probability density function f(x). The their joint density is, 


Fiesiags Piston te) = FEI 1)* 1) =| | LG) 


The log of any product is the sum of the logs of the multiplied terms, so the equation can be written 
as: >" ,_,/n(f, (x))). This relationship can be shown as /(@) = In[Z(©)]. The highest value of Log-likelihood 
statistics confirms a better result. 


2.5.4 Anderson-Darling (AD) Test 


In 1952 Anderson and Darling introduced the Anderson-Darling (AD) test (Anderson and Darling, 1952, 
1954). This test is treated as an alternative test for detecting sample distribution departure from normality. 
The mathematical formula for one sample AD test is defined as: 


Li, 
d= 42 oe (2i-1)(In(x,)) + Ind = (Xr) 


where, {X(1), X(2)»+-+» X()} iS the ordered sample with sample size n and F(x) is the cumulative distribution 
to which the sample is compared. The null hypothesis that {x(;) < x(2) ...< x(,)}comes from the underlying 
distribution F(x) is rejected if AD is larger than the critical value AD ,. The critical value for different sample 


Comparing the Performance of G-family Probability Distribution for Modeling Rainfall Data 223 


sizes is given by D’ Agostino and Stephens (1986). Engmann and Cousineau (2011) introduced a two sample 
Anderson-Darling (AD) test statistic for a goodness of fit test for comparing distributions. 


2.5.5. Kolmogorov-Simrnoff (KS) test 


Kolmogorov (1933, 1941) and Smirnoff (1939) proposed the Kolmogorov-Smirnoff (KS) test statistics as a 
test of the distance between the empirical distribution and the postulated theoretical distribution. The KS test 
statistic for a given theoretical cumulative distribution function F(x) is, 


KS=\n sup,|F, (x) — F(x) 


where, F(x) is the theoretical cumulative distribution function and F,(x) is the empirical cumulative 
distribution function for a sample size n. The null hypothesis is rejected if the empirical value of the KS test 
statistic is larger than the theoretical value of the KS test statistics (Massey, 1951). 


3. Case study 


Rajshahi Division is one of the oldest administrative divisions of Bangladesh covering an area 18,174.4 square 
kilometers and consists of eight districts namely, Rajshshi, Natore, Pabna, Bogura (Bogra), Chapainawabganj, 
Joypurhat, Naogaon and Sirajganj district. According to the 2011 census the total population of this division 
is 18,484,858. The geographical location indicates that Rajshahi is located in the western part of Bangladesh. 
This division is surrounding by the Khulna division in the South, Dhaka and Mymensingh in the East, 
Rangpur in the North and West Bengal state of India in the West. The two main rivers Padma and Jamuna 
are crossing over this area and besides these two rivers, they have numerous tributaries , Atrai, Karatoya and 
Mahananda. The land of this study area consists mainly of flat plains which produce a large variety of crops 
and vegetables like rice, wheat, pulses, potatos, carrots, onions and sugarcane . The administrative map of 
Rajshahi division is given in Figure 1. 

The data used in this work was collected from the Bangladesh Meteorological Department (BMD). The 
daily data covered the period from January, 1971 to December, 2015 which creates a total number of 16016 
observations from Rajshahi, Bogra and Pabna district of the Rajshahi division. We convert this daily rainfall 
data in to monthly rainfall data using Microsoft Excel. In environmental studies missing data is common. 
These missing values were random, and continuous missing data for one month to several months was also 
found in some years. We estimate the missing data by the smoothing SPSS software technique and then 
finally prepared the data for analysis. 


4. Results and discussion 


In this study we consider rainfall data from Rajshahi, Bogura and Pabna districts from Rajshahi division, 
Bangladesh. For rainfall data Papalexious (2012) proposed three steps for choosing a probability distribution 
which are (1) choose a priori some parametric families of distributions, (2) estimate the parameters with 
an appropriate fitting method, (3) find the best performing model based on some model evaluation criteria. 
Recently proposed G-family distributions are performing better than traditional probability distribution 
in many cases. So, in this study Gamma Uniform G, Kumaraswamy G, Marshall-Olkin G and Weibull G 
distribution with Chen, Frechet, Log-normal and Weibull distributions are considerd for investigation. The 
summary statistics of rainfall data for three districts, Rajshahi, Bogura and Pabna is given in Table | below. 

Table 1 showed that the average monthly rainfall for Rajshahi, Bogura and Pabna districts are 12.71, 6.24 
and 6.96 mm respectively. The rainfall amount is maximum in Rajshahi and minimum in Bogura district. All 
these rainfall series produce positive skewness and kurtosis is less than 3. The parameter estimation result 
with standard error and goodness of fit test for Rajshahi district is given in Table 2. 

The estimated result from Table 2 indicates that all the parameters from Gamma Uniform Chen, 
Gamma Uniform Frechet, Gamma Uniform Log-Normal, Gamma Uniform Weibull, Kumaraswamy Chen, 
Kumaraswamy Frechet, Kumaraswamy Log-Normal, Kumaraswamy Weibull, Marshall-Olkin Chen, 
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Figure 1: Map of Rajshahi division in Bangladesh (Source: author’s modified). 


Table 1: Summary statistics for monthly rainfall data. 


Min First Median Mean Third Max. Skewness Kurtosis 
quartile quartile 
Rajshahi 0.30 2,29 6.98 12.71 20.45 76.23 1.3346 1.8765 
Bogura 0.02 1.43 5.25 6.24 9.80 19.59 0.6345 0.7543 
Pabna 0.12 0.81 5.11 6.96 11.30 31.82 0.5828 0.6590 


(Source: authors’ own calculation) 


Marshall-Olkin Frechet, Marshall-Olkin Log-Normal, Marshall-Olkin Weibull, Weibull Chen, Weibull 
Frechet, Weibull Log-Normal and Weibull weibull distribution meet the criteria for significant parameters. 
The standard error from Marshall-Olkin chen produce the lowest standard error whereas the Gamma uniform 
Frechet distribution produces a higher Standard Error (SE). The model evaluation criteria AIC and BIC 
indicate that the Marshall-Olkin Chen distribution provides a better result for fitting the distribution to the 
rainfall data of Rajshahi district. The log likelihood value also shows a similar conclusion. The goodness of 
fit test statistic Anderson Darling statistic (AD) and Kolmogorov-Smirnov (KS) test statistic show that all 
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Table 2: Parameter estimation and model evaluation statistic for Rajshahi district. 


Model evaluation statistic 


Model Parameter estimation Model evaluation Test statistic 
B A AIC BIC -LL AD KS 
(SE) ue (SE) (SE) (p-value) 
Gamma uniform 4.339 0.0923 0.653 6975 6993 3483 5,35 0.772 
chen (1.341) (0.528) (0.731) (0.001) 
Gamma uniform 5.432 0.659 0.701 7096 7136 3501 7.64 0.812 
frechet (1.651) (0.987) (1.653) (0.010) 
Gamma uniform 0.218 0.927 0.365 6944 6962 3468 4.03 0.067 
log normal (2.446) (1.421) (0.564) (0.008) 
Gamma uniform 1.648 0.398 0.956 6821 6838 3406 6.07 0.082 
weibull (1.564) (0.557) (0.876) (0.000) 
Kumaraswamy 2.381 0.150 0.422 0.887 6843 6865 3416 5.0 0.058 
chen (1.112) (0.652) (0.453) (0.765) (0.033) 
Kumaraswamy 7.622 24.65 2.691 0.549 7013 7035 3501 6.95 0.804 
frechet (1.313) (0.870) (0.798) (0.654) (0.000) 
Kumaraswamy log 6.353 52.34 6.209 0.576 6990 7012 3490 5.26 0.074 
normal (1.776) (2.321) (0.897) (0.743) (0.002) 
Kumaraswamy 0.794 0.899 120.19 0.965 6876 6898 3433 5.69 0.602 
weibull (1.202) (0.781) (0.387) (0.657) (0.019) 
Marshall-Olkin 1.769 0.051 0.786 6739 6756 3365 3.86 0.012 
chen (1.023) (0.045) (0.121) (0.001) 
Marshall-Olkin 22.82 0.590 0.415 7051 7068 3521 1:39 0.079 
frechet (1.453) (0.450) (0.540) (0.001) 
Marshall-Olkin log 3.101 1.460 0.173 7005 7023 3499 5.88 0.071 
normal (2.541) (1.242) (0.909) (0.004) 
Marshall-Olkin 0.969 115.18 0.866 6883 6900 3437 6.07 0.061 
weibull (1.876) (0.896) (0.887) (0.021) 
Weibull chen 2.405 0.559 0.082 0.424 6993 7015 3492 5.92 0.087 
(1.432) (0.566) (0.098) (0.766) (0.000) 
Weibull frechet 5.262 10.71 0.134 0.131 6992 7014 3491 6.81 0.085 
(1.856) (1.234) (0.754) (0.562) (0.002) 
Weibull log normal 16.99 18.65 6.54 0.913 6969 6991 3479 5.29 0.078 
(1.760) (0.765) (0.543) (0.656) (0.002) 
Weibull weibull 0.901 0.739 15.21 0.994 6909 6931 3450 6.14 0.065 
(1.558) (0.665) (0.543) (0.356) (0,009) 


(Source: authors’ own calculation) 


these models are significant. For comparing models the Marshall-Olkin Chen model produces the lowest 
value for the Anderson and Darling test statistic and the Gamma uniform Frechet model produces the highest 
value. For baseline distributions different distributions such as Chen, Frechet, Log-normal and Weibull show 
better fitting results in the case of different models. The Q-Q plot of these models for the Rajshahi district is 
shown in Figure 2. The Q-Q plot also confirms that the Marshall-Olkin Chen model provides a better result 
for fitting the monthly rainfall data of the Rajshahi district. 

The estimated result from Table 2 indicates that all the parameters from Gamma uniform Chen, 
Gamma uniform Frechet, Gamma uniform Log-normal, Gamma uniform Weibull, Kumaraswamy Chen, 
Kumaraswamy Frechet, Kumaraswamy Log-normal, Kumaraswamy Weibull, Marshall-Olkin Chen, 
Marshall-Olkin Frechet, Marshall-Olkin Log-normal, Marshall-Olkin Weibull, Weibull Chen, Weibull 
Frechet, Weibull Log-normal and Weibull distributions meet the criteria for significant parameters . The 
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Figure 2: The Q-Q plot for the Rajshahi district (Source: created by authors) 


standard error from the Marshall-Olkin Chen is the lowest whereas the Gamma uniform Frechet distribution 
produces a higher standard error. The model evaluation criteria AIC and BIC indicate that the Marshall-Olkin 
Chen distribution provides a better result for fitting distributions to the rainfall data of the Rajshahi district. 
The log likelihood value also gives a similar conclusion. The goodness of fit test statistic Anderson Darling 
statistic (AD) and Kolmogorov-Smirnov (KS) test statistic show that all these models are significant. For 
comparing models the Marshall-Olkin Chen model produced the lowest value of the Anderson and Darling 
test statistic and the Gamma uniform Frechet model produces the highest value. For baseline distributions 
different distributions such as Chen, Frechet, Log-normal and Weibull show better fitting results in the case 
of different models. The Q-Q plots of these models for the Rajshahi district are shown in Figure 2. The 
Q-Q plots also confirm that the Marshall-Olkin Chen model provides better results for fitting the monthly 
Rajshahi district rainfall data. 

All sixteen models have been used in the case of monthly rainfall data of the Bogura district of Bangladesh 
and the estimated results are listed in Table 3. 

The estimated result from Table 3 indicates that most of the parameters meet the criteria for the parameters 
of the original distribution. Among these models, the estimated parameters from the Gamma uniform Weibull 
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Table 3: Parameter estimation and model evaluation statistic for Bogura district. 


Model evaluation statistic 
Model Parameter estimation Model evaluation Test statistic 
6 B B A AIC BIC -LL AD KS 
(SE) (SE) (SE) (SE) (p-value) 
Gamma uniform chen 1.791 0.248 0.017 3337 3354 1664 4.55 0.076 
(0.987) (0.564) (0.865) (0.001) 
Gamma uniform 1.153 3.692 0.734 3452 3470 1722 12.2 0.126 
frechet (1.121) (0.754) (0.877) (0.000) 
Gamma uniform log 0.506 1.052 0.145 3387 3404 1689 10.5 0.124 
normal (0.980) (0.643) (0.789) (0.000) 
Gamma uniform 0.284 16.67 0.016 3305 3312 1649 2.05 0.053 
weibull (0.325) (0.456) (0.665) (0.004) 
Kumaraswamy chen 1.740 0.117 0.946 0.008 3347 3369 1668 5:51 0.821 
(1.110) (0.769) (0.632) (0.056) (0.000) 
Kumaraswamy frechet 27.51 17.04 0.144 2.234 3509 3531 1749 12.2 0.12 
(1.716) (0.879) (0.709) (0.234) (0.000) 
Kumaraswamy log 0.155 0.891 0.107 0.441 3401 3423 1695 13.1 0.136 
normal (0.087) (0.311) (0.587) (0.212) (0.000) 
Kumaraswamy 0.289 0.454 7.041 0.016 3327 3349 1659 3.23 0.063 
weibull (0.843) (0.301) (0.476) (0.543) (0.016) 
Marshall-Olkin chen 2.213 0.181 0.018 3348 3366 1670 4.33 0.070 
(0.856) (0.451) (0.122) (0.005) 
Marshall-Olkin 7.26 2.24 1.58 3529 3546 1760 13.3 0.11 
frechet (1.980) (0.881) (0.786) (0.000) 
Marshall-Olkin log 8.87 1.27 0.059 3452 3470 1722 11 0.103 
normal (1.102) (0.421) (0.431) (0.000) 
Marshall-Olkin 2511 3.767 0.011 3376 3394 1684 7.34 0.095 
weibull (0.999) (0.571) (0.010) (0.000) 
Weibull chen 1.252 2.381 0.288 0.016 3345 3367 1668 5.35 0.084 
(1.210) (0.656) (0.657) (0.332) (0.000) 
Weibull frechet 9.504 0.648 0.381 0.053 3414 3436 1702 10.9 0.120 
(0.897) (0.660) (0.613) (0.229) (0.000) 
Weibull log normal 24.52 1.382 19.62 0.014 3394 3416 1692 10.1 0.105 
(111) (0.543) (0.587) (0.213) (0.000) 
Weibull weibull 1.035 0.837 7.595 0.071 3391 3413 1690 8.85 0.111 
(0.867) (0.430) (0.555) (0.056) (0.000) 


(Source: authors’ own calculation) 


distribution produce the lowest standard error whereas Marshall Olkin Frechet shows the higher standard 
error. The model evaluation criteria AIC and BIC indicate that lowest value of these two statistics is obtained 
from the Gamma uniform Weibull distribution which confirms the best performance. The log-like likelihood 
value shows that the same model provides a higher log likelihood value whereas Marshall-Olkin Frechet 
produces higher AIC and BIC values and also a lower log likelihood value. The goodness of fit test statistics 
AD and KS show that most of the models are statistically significant. For comparing the performance 
among these models both test statistics show that the Gamma uniform Chen distribution is the most suitable 
distribution and gives better fitting results whereas the Marshall-Olkin Frechet distribution provides worse 
fitting results. A similar result is obtained from the Q-Q plot of these distributions in Figure 3. 
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Figure 3: The Q-Q plot for Bogura district (Source: created by authors) 


For Pabna, the following Gamma uniform G family, Kumaraswamy G family, Marshall-Olkin G family 
and Weibull G family distribution with four baseline distributions, Chen, Frechet, Log-normal and Weibull 
distributions fitted the monthly rainfall data. The estimated parameter values with their standard error and 
model evaluation criteria are given in Table 4. 

The estimated result from Table 4 indicates that all the parameters satisfy the condition for the parameters 
of G family distribution. The Marshall-Olkin Chen model give a lower standard error and the Weibull Chen 
model gives a higher standard error. The AIC, BIC criteria may be used as relative goodness of fit measures 
which indicate that the lowest values indicate the best fitted models. These two criteria show the lowest 
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Table 4: Parameter estimation and model evaluation statistic for Pabna district. 


Model evaluation statistic 
Model Parameter estimation Model evaluation Test statistic 
6 B B A AIC BIC -LL AD KS 
(SE) (SE) (SE) (SE) (p-value) 
Gamma uniform chen 4.423 0.767 0.034 3441 3458 1716 8.62 0.105 
(1.434) (0.556) (0.121) (0.000) 
Gamma uniform 1.821 0.586 0.055 3529 3546 1760 14.3 0.131 
frechet (1.501) (0.631) (0.212) (0.000 
Gamma uniform log 0.558 1.618 0.011 3452 3470 1722 10.9 0.123 
normal (1.398) (0.480) (0.187) (0.001) 
Gamma uniform 1.049 11.610 0.654 3398 3416 1695 3.55: 0.098 
weibull (1.222) (0.497) (0.212) (0.001) 
Kumaraswamy chen 1.374 0.254 0.691 0.014 3362 3384 1676 5.61 0.078 
(1.029) (0.565) (0.641) (0.165) (0.001) 
Kumaraswamy frechet 11.34 4.849 0.130 0.155 3618 3640 1804 18.3 0.14 
(1.616) (0.632) (0.417) (0.172) (0.000) 
Kumaraswamy log 0.223 1.299 1.141 0.004 3552 3574 1771 12 0.108 
normal (1.566) (0.457) (0.390) (0.012) (0.002) 
Kumaraswamy 0.262 0.656 10.211 0.005 3383 3405 1687 4.59 0.073 
weibull (0.989) (0.563) (0.453) (0.012) (0.003) 
Marshall-Olkin chen 1.616 0.204 0.114 3326 3343 1659 5.01 0.049 
(1.087) (0.267) (0.011) (0.000) 
Marshall-Olkin frechet 5.223 0.065 0.030 3603 3621 1798 15,2 0.117 
(1.451) (0.765) (0.211) (0.000) 
Marshall-Olkin log 10.85 1.861 0.032 3514 3532 1753 11.1 0.098 
normal (1.232) (0.546) (0.145) (0.001) 
Marshall-Olkin 2.326 3.121 0.130 3437 3455 1715 7.78 0.095 
weibull (1.243) (0.551) (0.125) (0.000) 
Weibull chen 0.503 1.952 0.778 0.124 3763 3785 1877 16.6 0.333 
(1.996) (0.765) (0.889) (0.332) (0.000) 
Weibull frechet 11.49 0.507 1.298 0.136 3437 3496 1732 8.74 0.11 
(1.324) (0.667) (0.687) (0.432) (0.001) 
Weibull log normal 15.40 2.27 1.57 3.84 3451 3473 1721 8.48 0.105 
(1.245) (0.556) (0.530) (0.232) (0.001) 
Weibull weibull 1.123 0.794 8.470 0.056 3450 3472 1720 8.95 0.106 
(1.123) (0.332) (0.352) (0.356) (0.000) 


(Source: authors’ own calculation) 


value in the case of he Marshall-Olkin Chen model and the same model gives a higher log-likelihood value. 
The AD and KS test statistics indicate that the Marshall-Olkin Chen model outperforms the others. So, the 
Marshall-Olkin Chen model is the most suitable model for fitting monthly Pabna district rainfall data. The 
Q-Q plot from Figure 4 also shows a similar performance. 


5. Conclusion 


The study of the distribution of rainfall is important for the development of the economy. Although statistical 
distributions perform better in many cases they have some restrictions or limitations. To overcome these 
restrictions the G-family distribution is introduced. In this study Gamma uniform Chen, Gamma uniform 
Frechet, Gamma uniform Log-normal, Gamma uniform Weibull, Kumaraswamy Chen, Kumaraswamy 
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Figure 4: The Q-Q plot for Pabna district (Source: created by authors). 


Frechet, Kumaraswamy Log-normal, Kumaraswamy Weibull, Marshall-Olkin Chen, Marshall-Olkin Frechet, 
Marshall-Olkin Log-normal, Marshall-Olkin Weibull, Weibull Chen, Weibull Frechet, Weibull Log-normal 
and Weibull Weibull distributions have been used for modeling monthly rainfall data from the time period 
January, 1971 to December, 2015 in the case of Rajshahi, Bogura and Pabna districts. The model evaluation 
criteria indicate that the Marshall-Olkin Chen distribution gives the best fitting results in the case of Rajshahi 
and Pabna districts. The Gamma uniform Weibull distribution shows the best fitting result among all sixteen 
G-family distributions in the case of Bogura district. As a base line distribution Chen and Weibull distribution 
provides better results than the Frechet and Log-normal distributions for most of the cases. 
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In our study we only consider three districts Rajshahi, Bogura and Pabna from Rajshahi division and 
could not find unique a G-family distribution which gives the best fitting result for all three districts. So, we 
need to investigate more G-family distributions and base line distributions and also consider rainfall data 
from other districts. This is put off for the future . 
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Chapter 14 


Record-Based Transmuted Kumaraswamy 
Generalized Family of Distributions 
Properties and Application 

Qazi J Azhad,' Mohd Arshad,* Bhagwati Devi,' Nancy Khandelwal? and Irfan Ali 


1. Introduction 


Kumaraswamy probability distribution is continuous and, is defined on double-bounded support. It is the 
most popular alternative of the beta distribution and has many of its properties. One of the main striking 
differences between the two is the availability of Kumaraswamy’s cumulative distribution function in a closed 
form. So, the quantile function is much easier to calculate and work on. This property of Kumaraswamy 
distribution stands out and makes it more suitable for use in computational aspects using simulations. As the 
computational aspect in research has gained much grown rapidly during the past two decades, the use of the 
Kumaraswamy distribution has gained momentum and found publicity in the research fraternity due to its 
easier and efficient working. As computer programming using a complex system of distributional forms is 
difficult, the availability of Kumaraswamy in place of the Beta distribution has helped researchers to work 
easily for modeling data. The initial use of the Kumaraswamy distribution was seen in modeling hydrological 
phenomena (Kumaraswamy (1980)), but later on, it has been found to have extensive uses. Readers can go 
through Courard-Hauri (2007), Ganji et al. (2006), Sanchez et al. (2007) among others to have more insights. 

The focus of the researchers nowadays is to model complex data using more flexible forms of 
distributions. To obtain such distributions, several generalizations of Kumaraswamy, Weibull, Rayleigh, 
Gamma, Exponential and many other distributions, have been proposed. Despite having a plethora of models, 
we still need more models to cater to the demand of today’s world which is having big data in a complex 
form. Some of the most popular techniques of construction of a new probability model were studied by 
Balakrishnan and Risti’c (2016), He et al. (2016), Tahir et al. (2020), Ghosh et al. (2021) and others. 

In this chapter, the authors have adopted the recently proposed record-based transmuted map to generate 
new probability models by Balakrishnan and He (2021). Let us have a sequence of independent and identically 
distributed (iid) random variables X,, X,, X3,..., with distribution function (DF) G(-). Let Xj) and Xyy be 
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the first two upper records from this sequence of iid random variables. Then, define a new random variable 
as, 


y Xyqy, with probability 1— 
~ | Xz» with probability p, 


where, 0 < p< 1. Then the DF of the random variable Y can be easily obtained as, 
Fy(x)=( — p)P(Xuq Sa)+ PP(Xuw S$ <x) 


=e pec of G(x yy Cees 


= (1- p)G(x) + p[l- G(x)(1 - log G(x))] 
= G(x)+ pG(x) log G(x),x eR, 


where, G(x) = 1 — G(x) denotes the survival function of the baseline distribution. 
The probability density function and the failure density are given respectively as, 


fr (x) = g()[] — p —p logG(x)], x ER, (1) 
and 
1- p— plog G(x) 
hy,(x)=h = R 
(2) v0 ee ire 2) 


g(x) 


is the hazard function of the baseline 


where g(x) is the pdf of the baseline distribution and h,(x) = 


x 
distribution. We will utilize the above class of distributions and introduce Record-Based Transmuted 
Kumaraswamy Generalized Family of Distributions, which will be discussed in the next section. 
2. Record-based transmuted kumaraswamy generalized family of distributions 


Let us consider the Kumaraswamy distribution with parameters (a,0) as the baseline distribution. The 
distribution function and the density function of Kumaraswamy(a,0) are defined below, respectively, as 


G(x) = 1—[1 —A(x)"], x € (0,1). (3) 


The baseline distribution G(x), defined in (3), is a generalized form of the Kumaraswamy distribution 
which also has another baseline distribution A(x). Here, it can be easily observed that by considering different 
baseline h(x), we get new forms of probability distributions based on record transmuted maps. In this 
chapter, we are considering h(x) = x which reduces h(x) to the Kumaraswamy distribution. So, using the 
transformation, we get 


G(x) =1-[1 —x*]%, x € (0,1) (4) 
g(x) = adx*! (1 —[1 —x7]*!, x € (0,1). (5) 
After utilizing the density and DF of the baseline distribution defined in (4) and (5) in (1), we get 
Fy &) = aOx*" [1 — x7)?" {1 —p[1 + Alog[1 —x]}, x € (0,1), a>0, A> 0, p € [0,1]. (6) 
The corresponding DF is given as, 


Fy(x)=1-[1—x7]*! + p [1 —x*]? log] — x7)’, x € (0,1), a> 0, A> 0, p € [0,1]. (7) 
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The probability distribution defined in (6) and (7) is called Record-Based Transmuted Kumaraswamy 
(RTGK) distribution. Now we visualize the shape of the probability density and distribution functions of the 
RTGK distribution. 

For different setups of parameters, we have plotted the DF. In Figure 1, we have fixed a = 2 and 
6 = 5 and taken p = (0.1,0.3,0.5,0.7,0.9) and we have fixed 0 = 4 and p = 0.3 and plotted the function 
for varying values of a. In Figure 2, the 2D and 3D plots of the probability density function for various 
configurations of parameters are depicted. 
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Figure 2: 2D and 3DProbability density plots of RTGK distribution. 
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3. Distributional properties 
Theorem 3.1 (x) is the proper density function. 
Proof: The proof is straightforward and easy. So, we are skipping it. 


Theorem 3.2 The moment generating function (mgf) of a random variable X following RTGK distribution is 


defined as, 
M,0= >. Coe 1 a} 7 oft +6(y(0)- of E+ 1+ 0))| (8) 
i=07! a a 


where, g(-) = he 
ne 
Proof: From the formula of the moment generating function, we have, 
M y(t) = Ele” ] 
1 
= | e* {ax [1—- x7!" 1 — p[1 + A log[l — x ]]} }dx 
0 


1 1 
y t' } aOxt 1 — x% my aox*"T1- x% i aa 


ern 0 0 
i=0 7! 

7 [1+ @log[l — x” ]]dx 
Now by taking x* = ¢ and (1 — x”) =z and in the first and second integrals, respectively, we get the required 
result given in (8). 


Theorem 3.3 The r“ moment of the random variable X about the origin following the RTGK distribution is 


defined as, 
u= oBera( “+, a}i-o(1+6[ a-o[ Z+1+0}})} 
a a 


Proof: The proof of the theorem can be easily seen on similar lines to the proof of Theorem 3.2. 


4. RTGK distribution in terms of order and record statistic 


Order and record statistics are the building blocks of non-parametric inferences. This section provides the 
density functions of the random variable X having a RTGK distribution in order and record paradigm. 
Studying order and record values, and providing their analysis for different problems has been done by 
many researchers over the past years. Readers are advised to go through Khan and Arshad (2016), Devi et al. 
(2017), Chaturvedi et al. (2019a, 2019b), Sharma et al. (2019), Arshad and Baklizi (2019), Arshad and Jamal 
(2019a, 2019b, 2019c), Arshad et al. (2021a, 2021b), Azhad et al. (2021a, 2021b), Tripathi et al. (2021), to 
have more insight into the use of generalized distributions and further, the applications of order and record 
values in research problems. 

Let X1, X5, X3,..., X;, be a random variable following the RTGK distribution and let X(,), X, X@)--+5 Xin 
denote the corresponding order statistics. The density function of r” order statistics X(,, (r = 1,2,3,---, 2) is 
given as, 


1 
Fyne) 7 


Beta(r,n—r +1) 


(FQ) d- FQ)" fO)] 


r—l 


= f() r-l Se wee 
m= yl j i 1)'‘l- F(x)] 


i=0 


_ [aOx*" (=x)? {1 pl + Olog(1— x TIS 
Beta(r,n—r +l) 


" } I)'[d—x*)? {1 plog(d—x*)?3 7". 


i=0 
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Moreover, the densities of smallest and the largest order statistics are, given, respectively as, 
Fry) = [nabx™ (1 =" {1 pl + 8 log — x") FIC. — 4” 
{1 —p log(1 — x*)*}]"") 
and, 
Fry = [nox (1 =x") {1 — pl + 8 log —x)]}] 


n-1 


-1 . 
alg i Nd =x*)"{1- plog(-x*)"}T 


i=0 


For finding the density of record statistics, let X,, X>, X3,..., X, be a random variable following the RTGK 
distribution and let R,, Ry, R3,..., R, denote the corresponding record statistics. Then, the density function of 
n upper record statistics R,, is given as, 


fo, @) =(-nl - FO f@,)V@-D! 
_ {In =r?)?[1 = pa ind =r) 
(n-1)! 


5. Maximum likelihood estimation of parameters 


aor?" (1-r?)? "1 — p[l+ @log(l-x*)}}, 7, > 0. 


n 


Let X,, X>, X3,..., X, be a random sample of size taken from the RTGK distribution with parameters a, 0 and p. 
The likelihood function is given as, 


L(a,0,p\x) = a" 0" Tj x7" (1 — xf)" [1 —pCL + 8 In — x#))]}. 


Taking logs on both the sides, we will have the likelihood as, 


logl.(a,6,p\x) = nlog a +n log 0+ 54. {(a 1) logy, + (@— 1) log(1 — xj") + log[1 —p(1 + @ log(1 —x;"")))}} 

(9) 

Now, taking the derivatives of equation (9) with respect to a,@ and p, and equating them to 0, we get, 
respectively, the partial derivatives as, 


n 


= jog =24+ 7 log x, + (0  & log) | (pOx; log x,) 
ae (x7 -1) (x7 -1)(p(1 + @log(1— x2) -1) 


i=l 


n 


0 _n a Plog(1—x;") 
—logL 2+" owt xP) + 


00 = p(t @log(l—x7)-1 
O logL =) 1+ Alogdl— x") 
ép oF (p(1+ Olog(l— x7) —1) 


Now, for maximum likelihood estimate (MLE), consider, 


a log L = 0, eu log L =0, g logL=0. 
0a 00 Op 


As, the equations are nonlinear in nature, so manual solutions of the equations are very tedious and time 
consuming. Hence, we utilize the boon of simulations and adopt some computational technique like Newton 
Raphson for solving these equations for finding the roots. 
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6. Computational study 


In this section, we consider the computational aspect of the problem under consideration. We have obtained 
the descriptive measures like mean, variance, skewness (y,) and kurtosis (y,) of the RTGK distribution for 
various configurations of parameters and sample sizes. We have also performed the Monte Carlo study to 
monitor the performances of MLEs of unknown quantities. All the computation parts have been carried 
out with the aid of R software (R Core Team (2021). For the numerical computation of MLE, we have 
used the BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm developed by Broyden (1970), Fletcher 
(1970), Goldfarb (1970), and Shanno (1970). The BFGS method is an inbuilt algorithm in the optim function 
provided in R. Tables [1—2] represents descriptive measures of the RBTK distribution. From these tables, 


Table 1: Descriptive Measures of RBTK distribution. 


p a 0 Variance 1 Yo 
0.5 | 0.5 0.1156 —1.0849 
1 | | 0.0722 
15 1.1558 
2 2.1252 
2.5 2.9862 
3 1.9490 | 3.7514 
1 ~1.0554 
15 ~0.5040 
2 0.0542 
2.5 0.5747 
3 1.0492 
05 | 15 0.7781 
1 ~1.1420 
15 0.8596 
0.4 
2 ~0.4765 
os -0.0889 
3 0.2774 
05 | 2 0.0275 
1 ~1.0063 
15 0.9291 
2 0.6606 
2.5 0.3530 
3 ~0.0485 
as | 25 0.0786 | 0.0420 | 0.0258 0.9382 
1 0.1915 | 0.1170 | 0.0786 | -0.8061 | 
15 0.2884 | 0.1915 | 0.1362 | 0.9056 | 
2 0.3673 | 0.2583 | 0.1915 | 0.7249 | 
2.5 0.4316 | 0.3166 | 0.2424 | 0.4744 | 
3 0.4845 | 0.3673 | 0.2884 | 0.2120 | 
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Table 2: Descriptive Measures of RBTK distribution. 


p a 0 E(X) E(X?) E(X?) E(x‘) Variance 1 Vy 

0.5 | 0.5 0.7040 0.5983 | 0.5364 | 0.4938 0.1027 ~0.8927 ~0.5831 
1 0.8000 0.7040 | 0.6426 | 0.5983 0.0640 ~1.4198 1.0287 
15 0.8475 0.7621 | 0.7040 | 0.6607 0.0438 —1.7537 2.4794 
2 0.8764 0.8000 | 0.7457 0.7040 0.0319 ~1.9959 3.7624 
2.5 0.8960 0.8271 | 0.7763 0.7365 0.0243 ~2.1829 4.8960 
3 0.9102 0.8475 | 0.8000 | 0.7621 0.0191 ~2.3328 5.9006 
0.5 1 0.5000 0.3540 | 0.2794 | 0.2330 0.1040 ~0.0481 ~1.3574 
1 0.6500 0.5000 | 0.4125 0.3540 0.0775 ~0.6141 -0.7555 
1.5 0.7292 0.5890 | 0.5000 | 0.4375 0.0572 ~0.9429 0.0045 
2 0.7788 0.6500 | 0.5633 0.5000 0.0434 -1.1721 0.7313 
2.5 0.8129 0.6948 | 0.6117 0.5491 0.0340 ~1.3449 1.3954 
3 0.8378 0.7292 | 0.6500 | 0.5890 0.0273 —1.4815 1.9947 
0.5 | 15 0.3696 0.2195 | 0.1527 0.1154 0.0829 0.4315 -1.0272 
1 0.5440 0.3696 | 0.2769 0.2195 0.0737 ~0.2181 —1.0527 
1.5 0.6423 0.4710 | 0.3696 | 0.3026 0.0585 ~0.5678 ~0.5933 
. 2 0.7055 0.5440 | 0.4411 0.3696 0.0463 ~0.8035 ~0.0786 
2.5 0.7496 0.5991 | 0.4978 0.4248 0.0372 ~0.9779 0.4170 
3 0.7822 0.6423 | 0.5440 | 0.4710 0.0304 —1.1140 0.8757 
0.5 2 0.2833 0.1427 | 0.0879 0.0603 0.0624 0.7785 ~0.4095 
1 0.4667 0.2833 | 0.1940 | 0.1427 0.0656 0.0353 —1.0282 
1.5 0.5763 0.3879 | 0.2833 0.2180 0.0557 ~0.3394 ~0.7778 
2 0.6488 0.4667 | 0.3564 | 0.2833 0.0458 ~0.5849 ~0.3911 
25 0.7001 0.5277 | 0.4166 | 0.3395 0.0376 ~0.7639 0.0098 
3 0.7383 0.5763 | 0.4667 0.3879 0.0312 ~0.9022 0.3926 
0.5 | 2.5 0.2237 0.0966 | 0.0530 0.0331 0.0465 1.0549 0.3276 
1 0.4082 0.2237 | 0.1409 0.0966 0.0571 0.2171 ~0.9009 
1.5 0.5247 0.3270 | 0.2237 0.1626 0.0517 ~0.1821 ~0.8216 
2 0.6035 0.4082 | 0.2953 0.2237 0.0440 ~0.4377 ~0.5282 
2.5 0.6601 0.4726 | 0.3563 0.2785 0.0369 ~0.6216 ~0.1917 
3 0.7026 0.5247 | 0.4082 0.3270 0.0311 ~0.7626 0.1413 


it can be seen clearly that as we increase the value of a for fixed values of @ and p, the variance of the 
distribution decreases. Also, it can be seen that for increasing values of 0, again the mean and the variance 
of the distribution decreases. We also observe that the distribution shows a negatively skewed nature for the 
given combinations of parameters mostly. Tables [3—5] represents the behavior of ML estimates of unknown 
quantities with the aid of bias and MSE. From these tables, we observe that mostly estimates are showing that 
for increasing sample sizes the MSEs are decreasing. 
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Table 3: Bias and MSE of ML estimates for n = 50. 


F 7 i : eas : iE : 

a 0 p a 0 p 
1-5 1.5 —0.2619 0.5249 —0.1623 0.1444 0.4253 0.0348 
2 2 —1.3791 —1.4537 —0.1997 1.9187 2.1399 0.0400 
2 2.5 —1.3853 —1.9788 —0.2000 1.9331 3.9320 0.0400 
2.5 2 —1.8294 —1.3900 —0.1992 3.3699 1.9726 0.0399 
0.3 2.5 2.5 —1.8436 —1.9120 —0.1994 3.4164 3.6847 0.0399 
1.5 2 —0.8176 —1.7694 —0.0631 0.6837 3.1310 0.0156 
1.5 3 —0.7120 —2.6708 0.0612 0.5325 7.1333 0.0085 
2 2 —1.1473 —1.7640 —0.0976 1.3582 3.1120 0.0207 
2 3 —1.0405 —2.6684 0.0286 1.1330 7.1205 0.0094 
1.5 1.5 0.0299 —0.1764 —0.1710 0.0405 0.1561 0.0924 
2 2 —1.2860 —1.3242 —0.4921 1.6798 1.8118 0.2458 
2 2.5, —1.3187 —1.8875 —0.4988 1.7571 3.5920 0.2495 
25 2 —1.7581 —1.2774 —0.4877 3.1182 1.7059 0.2440 
2.5 25 —1.7704 —1.8110 0.4951 3.1586 3.3300 0.2476 
ue 1.5 2 —0.7466 —1.7646 —0.3333 0.5782 3.1141 0.1292 
1.5 3 —0.6812 —2.6510 —0.1685 0.4881 7.0278 0.0510 
2 2 —1.4781 —1.7597 —0.2267 2.2738 3.0968 0.0905 
2 3 —1.3291 —2.6471 —0.0844 1.8893 7.0076 0.0472 
1.5 1.5 0.4259 —1.3038 —0.7369 0.2542 1.7004 0.5516 

Table 4: Bias and MSE of ML estimates for n = 75. 
Bias MSE 

sae (ae / a 6 B a 6 B 
1.5 1.5 —0.2723 —0.5734 —0.1834 0.1313 0.4218 0.0377 
2 —1.3887 —1.4699 —0.1997 1.9389 2.1753 0.0400 
25 —1.3959 —1.9850 —0.2000 1.9577 3.9500 0.0400 
25 2 —1.8446 —1.4124 —0.1997 3.4155 2.0131 0.0400 
0.3 255 2.5 —1.8540 —1.9353 —0.2000 3.4476 3.7582 0.0400 
1.5 2 —0.8223 —1.7704 —0.0589 0.6872 3.1343 0.0136 
1.5 3 —0.7202 —2.6700 0.0713 0.5334 7.1291 0.0082 
2 —1.1822 —1.7666 —0.0980 1.4308 3.1211 0.0190 
3 —1.0734 —2.6663 0.0354 1.1817 7.1092 0.0081 
1.5 1.5 0.0358 —0.2099 —0.2001 0.0349 0.1523 0.1085 
2 2 —1.3075 —1.3560 —0.4979 1.7241 1.8657 0.2490 
2 2.5 —1.3272 —1.8971 —0.5000 1.7739 3.6182 0.2500 
25 2 —1.7638 —1.2990 0.4944 3.1291 1.7271 0.2472 
25 2.5 —1.7891 —1.8434 —0.5000 3.2150 3.4195 0.2500 
on 1.5 2 —0.7493 —1.7659 —0.3415 0.5768 3.1185 0.1291 
1.5 3 —0.6766 —2.6501 —0.1826 0.4757 7.0232 0.0487 
2 2 —1.5184 —1.7599 —0.2220 2.3575 3.0975 0.0821 
2 3 —1.4034 —2.6450 —0.0460 2.0494 6.9963 0.0335 
1.5 1.5 —0.4450 —1.3069 —0.7435 0.2438 1.7081 0.5589 
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Table 5: Bias and MSE of ML estimates for = 100. 


1,5 1.5 —0.2858 —0.6061 0.1923 0.1263 0.4292 0.0389 
2 2 1.3906 —1.4739 —0.2000 1.9418 2.1823 0.0400 
2 2.5 —1.4006 —1.9932 —0.2000 1.9676 3.9793 0.0400 
25 2 1.8513 —1.4220 —0.2000 3.4374 2.0365 0.0400 
0.3 2.5 2.5 —1.8608 —1.9348 —0.2000 3.4704 3.7530 0.0400 
1:5 2 —0.8290 —1.7709 —0.0640 0.6948 3.1362 0.0126 
1.5 3 —0.7263 —2.6694 0.0779 0.5387 7.1259 0.0081 
2 2 —1.1905 —1.7676 —0.1026 1.4440 3.1244 0.0192 
2 3 —1.0892 —2.6663 0.0454 1.2106 7.1093 0.0071 
1.5 1, 0.0554 —0.2390 0.2303 0.0323 0.1550 0.1215 
2 2 1.3191 1.3763 —0.5000 1.7501 1.9094 0.2500 
2 2.5 1.3376 1.9102 —0.5000 1.7971 3.6590 0.2500 
25) 2 =1.7751 -1.3179 —0.4998 3.1645 1.7589 0.2498 
ae 25 2.5, —1.7965 —1.8536 —0.5000 3.2381 3.4505 0.2500 
15 2 —0.7568 —1.7667 —0.3450 0.5835 3.1211 0.1284 
1.5 3 —0.6900 —2.6504 —0.1738 0.4886 7.0245 0.0430 
2 2 —1.5453 —1.7597 —0.2001 2.4224 3.0965 0.0686 
2 3 —1.4298 —2.6433 —0.0291 2.1084 6.9871 0.0275 
1.5 1.5 —0.4529 —1.3084 —0.7496 0.2375 1.7120 0.5666 


7. Real data illustration 


In this section, we provide a real dataset to show the application aspect of the RTGK distribution. Here we 
consider the SC16 dataset which is an algorithm for estimating the unit capacity factors. These datasets were 
given by Caramanis et al. (1983) and, Mazumdar and Gaver (1984) and further used by Khan and Arshad 
(2016). The datasets are presented in Table [6]. 

In order to show the fitting of the SC16 dataset with the RTGK distribution, we consider the Kolmogoroy- 
Smirnov (KS) test. From the KS test, we find that the SC16 dataset supports the RTGK distribution for 
a=0.5009893, 6= 1.2976525 and p = 0.1 KS distance , 0.18003 and p-value, 0.4452. This fitting is exhibited 
in Figure 3. The fitting is visualized based on the probability density and cumulative distribution functions. 

The values of parameters for which fitting is visualized are also ML estimates of the unknown quantities 
of the RTGK distribution. For these sets of values, the first four raw moments of the distribution are 
0.28969521, 0.16036179, 0.10818759, 0.08049182, respectively. The variance of the distribution is 
0.07643848. The coefficients of skewness and kurtosis are 0.82541778 and —0.47634863, respectively. 


Table 6: SC16 Dataset. 


0.853 0.759 0.866 0.809 0.717 0.544 0.492 0.403 0.344 


0.213 0.116 0.116 0.092 0.07 0.059 0.048 0.036 0.029 


0.021 0.014 0.011 0.008 0.006 
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Figure 3: Plots depicting fitting of SC16 dataset. 


8. Concluding remarks 


In this chapter, we have proposed a new form of probability distribution called the RTGK distribution. 
We have derived different distributional properties of this distribution. The properties include descriptive 
statistics, shape of probability distribution, moment generating function and more. We have also considered 
point estimation for the unknown quantities of the RTGK distribution using the maximum likelihood 
technique. Further, we have reported these properties in tabular form with the aid of R software. The shape of 
the distribution is visualized for different configurations of unknown quantities. From the simulation study, 
we have monitored the performance of ML estimates using the criteria of Bias and MSE. Moreover, we have 
included a real dataset to show the aspect of application of the proposed distribution. 
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Finding an Efficient Distribution to Analyze 
Lifetime Data through Simulation Study 
Anamul Haque Sajib,'* Trishna Saha* and M Sayedur Rahman’ 


1. Introduction 


Lifetime data or survival data arise vastly in our everyday life from different disciplines, most importantly, 
medical, engineering and actuarial science. The Gamma and the Weibull distributions are used over 
Exponential distributions for modelling time to event data as the former two distributions have flexible 
hazard functions while the latter one has a constant hazard function. Unfortunately, both the distributions 
have certain drawbacks. For example, computing the distribution function or survival (hazard) function of 
the Gamma distribution requires computer software or mathematical tables which are approximate. Although 
the distribution function or survival (hazard) function of the Weibull distribution can be computed directly, 
its hazard function increasing from zero to infinity when the shape parameter is greater than one. It may 
be unrealistic to use the Weibull distribution in some situations because of this property. For instance, in 
practical life the hazard rate should reach a stable value when population items are kept in regular follow up 
programs rather than increasing to infinity. On the contrary, the hazard of the Gamma density is increasing 
to finite numbers for shape parameter values greater than one. Therefore, the gamma distribution could be a 
more suitable alternative compared to the Weibull distribution. Furthermore, the MLE of Weibull parameters 
are not stable for all the values of parameters (Bain (1978)). However, the Weibull distribution is often 
considered for analyzing life time data instead of the Gamma distribution as the former one handles censored 
observations much more easily than the Gamma distribution (Gupta and Kundu, 1997). 

Gupta and Kundu (1999) introduced the Generalized Exponential (GE) distribution with three-parameters 
as a substitute to the Gamma or Weibull distribution. They also showed that many properties of this family 
are very similar to those of the gamma family, but its distribution function has a closed form like Weibull. 
For example, like the gamma density likelihood ratio ordering property with respect to the shape parameter 
also holds for the GE. As a result, the UMP test can be constructed to test one-sided hypotheses related to the 
shape parameter. On the other hand, Weibull distribution does not have this likelihood ratio ordering property. 
From their data analysis, where one real data set was used, it was clear that the GE fits better than the Gamma 
and Weibull distributions with three-parameters. 

Unfortunately, Gupta and Kundu (1999) did not consider any simulation study in their work to support 
their claim. Furthermore, it has not yet been explored how GE performs when the data contains censored 
observations compared to the Weibull or Gamma distribution. Motivated by this we aim to conduct an 
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extensive simulation study to learn how GE performs compared to the Weibull or the Gamma distribution in 
cases with and without censoring. 


2. Methodology 


In this section, we will briefly introduce all the probability density functions which are used 
in this paper. The probability density function of the GE with three parameters is given by 


a-l 
_(X-H# _(x-H# 
F(x, A) =F l-e . e ( ae nae where shape, scale and location parameters are 


a, 2 and mw respectively. Similarly, the Gamma with three parameters and the Weibull with three- 


{32H 
parameter densities are defined as f(x;a,/,u)= (8 —p)* te ( - i >u,a,A>0 and 
I —ny I(a)a® 
a-l [(x-u 
ft (x3a,4, WL) = a(=#) e ( 4 3x > pu,a,A1 >0, respectively. Here a, A and « used in both the Gamma 


and Weibull densities have the same meaning as a, A and w used in the GE density. Apart from these three 

density functions we have also used Lindley and Half-logistic distributions for simulation purposes. The 

probability density function of Lindley variate X (survival time), was introduced by Lindley (1958), is 
2 


definedas f(x;0) = “A +x)e°, x, @>0, where @ is the scale parameter. Finally, the probability density 
+ 


function of the half-logistic distribution is given by f(x) = a 20. 
te“) 


3. Simulation setting 


We have used simulated data sets to investigate the performance of the Gamma, Weibull and Generalized 
Exponential distributions for analysing skewed or lifetime data. Two different situations are considered here 
to create simulated data sets: (i) data are simulated from one of the three distributions mentioned here, 
(ii) data are simulated from other lifetime distributions (Lindley and half-logistic) rather than the Gamma, 
Weibull and Generalized Exponential distributions. The purpose of creating these two settings is to investigate 
how the Gamma, Weibull and Generalized Exponential distributions perform when data came originally 
from either one of these three distributions or any other distribution rather than the Gamma, Weibull and 
Generalized Exponential distributions respectively. All the distributions considered here have one common 
property which is their hazard functions are increasing or decreasing. 

For each setting, we have considered different sample sizes and different combinations of parameter 
values when data are simulated from a specific distribution. The purpose of considering different sample 
sizes and different combinations of parameter values is to see the effects of both sample sizes and different 
combinations of parameter values on the performance of each distribution. Finally, the performance of each 
method is evaluated based on the result obtained from the Kolmogorov-Smirnov Goodness-of-Fit (KS) test, 
Anderson- Darling (AD) test and Monte Carlo simulations. All the results presented in the results section will 
be reproducible as certain seed numbers are used when they are produced. 


4. Results and discussions 


In this section, we present all the results produced in our simulation study and a detailed discussion is made 
based on the findings in the simulation study. Tables’ 1-3 shows the average KS distance and the total number 
of accepted Hp against each method for different 7 and different parameters values. More specifically, column 
3 of Table 1 shows the average KS distances for the Gamma, Weibull and GE densities when they are fitted 
on the sample data originally came from the Weibull (a = 2.5, A = 1.2, u = 6). The average KS distance is 
calculated based on 1000 simulated data sets for each sample size. We use the notation KS (Gamma), KS 
(Weibull) and KS (GE) to denote the KS distance for the Gamma, Weibull and GE densities respectively. 
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From the Table 1, it is observed that KS (Gamma) < KS (Weibull) < KS (GE) when n < 250 and as the sample 
size increases the relation becomes KS (Weibull) < KS (Gamma) < KS (GE). 

In other words, we can conclude that the Gamma, Weibull and the GE perform roughly equally well 
(Gamma performs better than the other two densities marginally) for small sample sizes (n < 250) while for 
large sample sizes (n > 250) the Weibull performs better than the Gamma density marginally but far better 
than the GE. 

The Ho used in the columns 4-5 of Table 1 is defined as “H: the sample is drawn from the reference 
distribution” vs “H,: the sample is drawn from any other distribution” under both KS and AD tests. For 
example, when Gamma density is considered for fitting the generated data then the null hypothesis is 
“Hp: the sample is drawn from the Gamma distribution”. Similarly, when Weibull and GE are considered for 
fitting the sample data, the H, will be changed accordingly. From the KS and AD tests results, it is observed 
that total number of accepted H) for the Gamma and Weibull are roughly equal irrespective of all sample 
sizes but for the GE number of accepted H) is very low when the sample sizes increase (n = 500). Therefore, 
based on KS and AD tests results, it is concluded that when data came from the Weibull (a = 2.5, A = 1.2, 
Ht = 6), considering Gamma and Weibull to fit the sample data is roughly equally efficient irrespective of 
all sample sizes but considering GE to fit the sample data is a bad choice for large sample sizes, especially 
n= 500. These findings are consistent with the findings which are made based on the KS distance. 

From Table 2, it is observed that the Gamma, Weibull and GE perform roughly equally well (Gamma 
performs better than the other two densities marginally) irrespective of all sample sizes when sample data 
originally came from the Weibull (a = 1.5, A = 3.2, u = 2) while Table 3 has findings similar to those of Table 1. 

Figure | shows the overlaying of the fitted Weibull (red), Gamma (blue) and GE (green) densities on the 
sample histogram (data originally came from Weibull (a = 2.5, A = 1.2, w= 6)), and the empirical cdf versus 
fitted cdf plots for a sample size, m = 1000. From this plot, it is evident that the Weibull fits better than the 
other two densities which is consistent with the above findings. 

Tables 4-6 and Figure 2 present exactly similar information like the information presented in Tables 1—3 
and Figure 1. However, for Tables 4-6 and Figure 2, the sample data came originally from the Gamma density. 


Table 1: Average KS distances and total number of accepted Hp determined using 1000 synthetic data sets 
originally coming from the Weibull (a = 2.5, A= 1.2, u=6), (a> A). 


Sample size Distribution Average Distance HA, (KS) Hi, (AD) 

Weibull 0.125 970 970 

os) Gamma 0.121 998 993 
GE 0.133 991 991 

Weibull 0.094 938 937 

oe Gamma 0.085 1000 999 
GE 0.100 997 999 

Weibull 0.066 950 950 
o—/ Gamma 0.061 1000 1000 
GE 0.078 996 998 

n= 250 Weibull 0.043 970 970 
Gamma 0.041 999 999 

GE 0.061 966 962 

n=500 Weibull 0.028 987 987 
Gamma 0.031 999 999 

GE 0.053 778 665 

n= 1000 Weibull 0.019 996 995 
Gamma 0.024 996 998 

GE 0.048 268 39 
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Table 2: Average KS distances and total number of accepted Hp determined using 1000 synthetic data sets 
that originally came from the Weibull (a = 1.5, 2 = 3.2, w= 2), (a<A). 


Sample size Distribution Average Distance HA, (KS) Hi, (AD) 

Weibull 0.177 796 728 

4-28 Gamma 0.159 873 841 
GE 0.169 833 815 

Weibull 0.114 862 844 

n= 0 Gamma 0.092 983 981 
GE 0.096 971 970 

Weibull 0.078 910 910 
n= 100 Gamma 0.064 1000 1000 
GE 0.067 999 999 

Weibull 0.050 958 958 
a= 25) Gamma 0.044 999 1000 
GE 0.047 997 999 

Weibull 0.038 962 962 
n= 500 Gamma 0.034 998 1000 
GE 0.038 983 997 

Weibull 0.024 984 984 

n= 1000 Gamma 0.028 988 999 
GE 0.033 933 963 


Table 3: Average KS distances and total number of accepted H, determined using 1000 synthetic data sets that 


originally came from the Weibull (a = 3, 2 = 3.2, w= 4), (a =A). 


Sample size Distribution Average Distance Hy (KS) Hi, (AD) 

Weibull 0.117 996 994 

n=25 Gamma 0.122 992 992 
GE 0.138 992 987 

Weibull 0.085 997 995 
n=30 Gamma 0.087 1000 1000 
GE 0.108 998 999 

Weibull 0.061 991 987 
n= 100 Gamma 0.063 1000 1000 
GE 0.087 990 991 

Weibull 0.040 984 984 
n= 250 Gamma 0.043 999 1000 
GE 0.071 867 756 

Weibull 0.031 967 967 
n= 500 Gamma 0.033 998 1000 
GE 0.064 410 106 

Weibull 0.026 960 961 

#= 1000 Gamma 0.026 985 998 

GE 0.058 (3 0 
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Figure 1: Overlaying the fitted Weibull (red), Gamma (blue) and GE (green) densities on the 
sample histogram (data originally came from the Weibull (a = 2.5, 2 = 1.2, « = 6)), and the 
empirical cdf versus fitted cdf plots. 


Table 4: Average KS distances and total number of accepted Hp determined using 1000 synthetic data sets 
that originally came from the Gamma (a = 2.5, 1 = 1.2, u= 6), (a>). 


Sample size Distribution Average Distance HA, (KS) Hi, (AD) 

Weibull 0.162 879 870 

n=25 Gamma 0.137 981 956 
GE 0.139 982 956 

Weibull 0.108 908 901 

ca Gamma 0.088 993 997 
GE 0.089 991 992 

Weibull 0.073 938 935 

a 100 Gamma 0.061 997 997 
GE 0.062 997 999 

Weibull 0.051 934 930 
n=250 Gamma 0.038 1000 1000 
GE 0.040 999 1000 

Weibull 0.044 898 8904 

n= S00 Gamma 0.027 1000 999 
GE 0.030 999 1000 

Weibull 0.037 863 831 

n= 1000 Gamma 0.020 999 999 
GE 0.023 993 987 
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Table 5: Average KS distances and total number of accepted Hp determined using 1000 synthetic data sets that 


originally came from the Gamma (a= 1.5, 4 = 3.2, u=2), (a<A). 


Sample size Distribution Average Distance A, (KS) Hy, (AD) 

Weibull 0.248 542 481 

n= 25 Gamma 0.181 822 769 
GE 0.182 825 759 

Weibull 0.187 614 605 

n= 30 Gamma 0.106 918 900 
GE 0.109 909 901 

Weibull 0.145 670 669 

n= 100 Gamma 0.065 993 992 
GE 0.067 987 986 

Weibull 0.124 703 703 
n= 250 Gamma 0.040 1000 1000 
GE 0.041 1000 1000 

Weibull 0.080 826 827 

n= 500 Gamma 0.028 998 997 
GE 0.029 1000 1000 

Weibull 0.042 932 935 

n= 1000 Gamma 0.020 997 996 
GE 0.021 999 1000 


Table 6: Average KS distances and number of accepted Hj determined using 1000 synthetic data sets that 
originally came from the Gamma (a= 3, 4 = 3.2, w= 4), (a= A). 


Sample size Distribution Average Distance Ai, (KS) H, (AD) 

Weibull 0.484 173 162 

nas Gamma 0.128 989 978 
GE 0.137 975 948 

Weibull 0.480 203 198 

= 30 Gamma 0.086 998 998 
GE 0.088 997 998 

Weibull 0.445 288 283 

n= 100 Gamma 0.061 999 999 
GE 0.062 999 1000 

Weibull 0.454 270 266 

n= 250 Gamma 0.040 996 986 
GE 0.040 1000 1000 

Weibull 0.471 262 259 

a= 300 Gamma 0.028 985 981 
GE 0.029 1000 1000 

Weibull 0.480 247 226 

n= 1000 Gamma 0.021 976 968 
GE 0.023 999 1000 
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Figure 2: Overlaying the fitted Weibull (red), Gamma (blue) and GE (green) densities on the sample histogram (data 
that originally came from the Gamma (a = 2.5, A = 1.2, w= 6)), and the empirical cdf versus fitted cdf plots. 


From these tables, it is clear that the performance of the Gamma and GE are better than the performance of 
the Weibull density for all sample sizes and the efficiency of the Gamma and GE are approximately equal 
for all situations. Especially, the Weibull density performs very badly when originally data came from the 
Gamma density with a ~ A. Figure 2 shows the performance of Gamma, Weibull and GE densities for a one 
data set with size n = 1000 originally that came from the Gamma (a = 2.5, 2 = 1.2, 4 = 6), and these findings 
are consistent with the findings from Tables 4—6. 

When the data originally came from the GE density, irrespective of all sample sizes and combinations, 
the GE performs better than the Gamma and Weibull. In general, the Gamma marginally performs better 
than the Weibull irrespective of all sample sizes and combinations. However, the Weibull performs badly 
compared to Gamma when a ~ / for GE density. These findings are shown in Tables 7—9 and in Figure 3. 

Finally, when the data originally came from different distributions (other than these three distributions), 
the Weibull distribution performs better than both the Gamma and GE densities while the performance of the 
two latter ones are roughly equal. For example, when the data originally came from the Lindley and Half- 
logistic densities, the Weibull distribution fitted the sample data better than both the Gamma and GE for all 
scenarios which is shown in Tables 10—13 and in Figures 4—5. 

To sum up, we can conclude that there is none among the Gamma, Weibull and GE densities which is the 
best to model the skewed or life time data for all situations. When the data came from different distributions 
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Table 7: Average KS distances and total number of accepted H, determined using 1000 synthetic data sets that 


originally came from the GE (a = 2.5, 2 = 1.2, w= 6), (a>). 


Sample size Distribution Average Distance | Hy (KS) Hi, (AD) 
Weibull 0.183 809 790 
n=25 Gamma 0.138 981 964 
GE 0.148 | 949 904 
Weibull 0.140 | 834 832 
n=50 Gamma 0.095 | 992 994 
GE 0.092 | 989 987 
Weibull 0.099 | 899 899 
n= 100 Gamma 0.073 | 981 968 
GE 0.062 | 999 999 
Weibull 0.055 | 974 973 
n=250 Gamma 0.061 | 815 686 
GE 0.039 | 1000 1000 
Weibull 0.038 | 983 968 
n= 500 Gamma 0.055 | 623 460 
GE 0.028 | 1000 1000 
Weibull 0.033 | 900 764 
n= 1000 Gamma 0.054 | 387 285 
GE 0.020 | 1000 1000 


Table 8: Average KS distances and total number of accepted Hy determined using 1000 synthetic data sets that 


originally came from the GE (a = 1.5,2=3.2,u=2),(a<A). 


Sample size Distribution Average Distance Hy, (KS) Hy (AD) 
Weibull 0.216 682 606 
n=25 Gamma 0.188 794 742 
GE 0.194 764 710 
Weibull 0.152 Ee 708 
#= 30 Gamma 0.109 910 895 
GE 0.114 884 874 
Weibull 0.109 806 803 
a= 100 Gamma 0.065 983 982 
GE 0.068 983 969 
Weibull 0.098 786 787 
n= 250 Gamma 0.040 1000 1000 
GE 0.040 1000 999 
Weibull 0.103 746 745 
a= 300 Gamma 0.029 1000 1000 
GE 0.029 999 997 
Weibull 0.127 648 653 
n= 1000 Gamma 0.020 1000 1000 
GE 0.020 1000 1000 
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Table 9: Average KS distances and total number of accepted Hj determined using 1000 synthetic data sets that 
originally came from the GE (a = 3,2 = 3.2, u=4), (a=). 


Sample size Distribution Average Distance A, (KS) Hy (AD) 

Weibull 0.154 939 913 

n= 25 Gamma 0.140 951 926 
GE 0.145 954 915 

Weibull 0.098 952 949 

ne 8 Gamma 0.088 994 995 
GE 0.089 995 995 

Weibull 0.070 961 961 
n= 100 Gamma 0.061 1000 1000 
GE 0.061 999 999 

Weibull 0.051 959 957 
n= 250 Gamma 0.039 1000 1000 
GE 0.039 999 999 

Weibull 0.045 923 895 
n= 300 Gamma 0.029 999 1000 
GE 0.028 1000 1000 

Weibull 0.042 766 653 
n= 1000 Gamma 0.022 997 1000 
GE 0.020 997 995 


Table 10: Average KS distances and total number of accepted Hy determined using 1000 synthetic data sets that 


originally came from the Lindley (0 = .5). 


Sample size Distribution Average Distance Ay (KS) Hy (AD) 

Weibull 0.151 936 856 

n= es Gamma 0.197 794 749 
GE 0.222 626 593 

Weibull 0.097 971 959 

n=30 Gamma 0.114 903 887 
GE 0.123 842 826 

Weibull 0.065 997 997 

n= 100 Gamma 0.071 982 979 
GE 0.076 946 936 

Weibull 0.041 998 997 
pa Gamma 0.046 998 1000 
GE 0.048 995 999 
Weibull 0.030 1000 1000 

B30 Gamma 0.036 996 998 
GE 0.038 983 995 
Weibull 0.042 999 1000 

n= 1000 Gamma 0.022 974 992 
GE 0.020 945 997 
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Figure 3: Overlaying the fitted Weibull (red), Gamma (blue) and GE (green) densities on sample histogram (data originally 
came from the GE(a = 2.5, A = 1.2, w= 6)), and the empirical cdf versus fitted cdf plots. 


other than these three densities, the Weibull density performed marginally better than the Gamma and GE. 
Interestingly, the Gamma density performs marginally better than the Weibull even though originally the 
sample data came from the Weibull, especially for small sample sizes (n < 250). For large sample sizes both 
the Gamma and Weibull are equally efficient but the GE performs badly. On the other hand, the Weibull 
density performs very badly for some situations when originally the data came from the Gamma density. 
As expected, the performance of the Gamma and GE are the best, irrespective of all sample sizes and all 
combinations, when originally the data came from the Gamma and GE, respectively. 


5. Conclusion 


This chapter investigates the rationality of using the Generalized Exponential (GE) density as a substitute to 
the Gamma and Weibull densities to model the skewed or lifetime data. The simulation study suggests that in 
general the Weibull density performs marginally better than the Gamma and GE irrespective of all different 
parameters values and sample sizes when the sample data originally came from other skewed distributions 
rather than the Gamma, Weibull and GE. As expected, the performance of the Gamma and the GE are the best, 
irrespective of all sample sizes and all combinations of different parameter values, when originally the data 
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Table 11: Average KS distances and total number of accepted Hj determined using 1000 synthetic data sets 
that originally came from the Lindley (0 = .8). 


Sample size Distribution | Average Distance A, (KS) Hy, (AD) 
Weibull 0.153 931 853 
n=25 Gamma 0.179 843 806 
GE | 0.203 739 678 
Weibull | 0.103 965 941 
a Gamma 0.121 890 873 
GE 0.137 766 749 
Weibull 0.068 981 978 
n= 100 Gamma 0.076 960 953 
GE 0.082 924 896 
Weibull 0.043 983 982 
n= 250 Gamma 0.047 991 995 
GE 0.049 983 989 
Weibull 0.030 1000 999 
n= 300) Gamma 0.036 996 998 
GE 0.038 983 993 
Weibull 0.022 997 999 
n= 1000 Gamma 0.029 967 992 
GE 0.031 943 979 


Table 12: Average KS distances and total number of accepted H, determined using 1000 synthetic data sets 
that originally came from the Lindley (6 = .1). 


Sample size Distribution Average Distance HA, (KS) Hi, (AD) 
Weibull 0.153 931 853 
nee Gamma 0.179 843 806 
GE 0.203 739 678 
Weibull 0.103 965 941 
n=50 Gamma 0.121 890 873 
GE 0.137 766 749 
Weibull 0.068 981 978 
n= 100 Gamma 0.076 960 953 
GE 0.082 924 896 
Weibull 0.043 983 982 
n= 250 Gamma 0.047 991 995 
GE 0.049 983 989 
Weibull 0.030 1000 999 
n= 500 Gamma 0.036 996 998 
GE 0.038 983 993 
Weibull 0.022 997 999 
n= 1000 Gamma 0.029 967 992 
GE 0.031 943 979 
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Table 13: Average KS distances and total number of accepted Hy determined using 1000 synthetic data sets that 


originally came from the Half-logistic. 


Sample size Distribution | Average Distance HA, (KS) Hy (AD) 
Weibull | 0.151 930 866 
n=25 Gamma 0.193 7196 728 
GE 0.212 693 635 
Weibull 0.114 885 852 
n=50 Gamma 0.116 925 902 
GE 0.127 837 823 
Weibull 0.070 986 974 
n= 100 Gamma 0.078 960 965 
GE 0.083 922 913 
Weibull 0.043 1000 999 
n= 250 Gamma 0.050 996 999 
GE 0.052 985 989 
Weibull 0.032 998 1000 
n= 500 Gamma 0.040 997 993 
GE 0.042 957 982 
Weibull 0.026 998 993 
n= 1000 Gamma 0.034 889 937 
GE 0.036 824 874 
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Figure 4: Overlaying the fitted Weibull (red), Gamma (blue) and GE (green) densities on the sample 
histogram (data that originally came from the Lindley (9 = .5)), and the empirical cdf versus fitted cdf plots. 
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Figure 5: Overlaying the fitted Weibull (red), Gamma (blue) and GE (green) densities on the sample histogram (data 
that originally came from the Ha/f-/ogistic), and the empirical cdf versus fitted cdf plots. 


came from the Gamma and the GE, respectively. Interestingly, the Gamma also performed equally well like 
Weibull when the data originally came from Weibull but the performance of the GE is very poor, especially 
for large sample sizes. This research work is limited to investigating the performance of the Gamma, Weibull 
and GE to model the lifetime data when data do not have any censoring. Investigating the performance of the 
Gamma, Weibull and GE to model the lifetime data with censoring is currently being undertaken. 
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Chapter 16 


Exponentiated Muth Distribution 
Properties and Applications 
R Maya,' MR Irshad** and Anuresha Krishna? 


1. Introduction 


In the backdrop of reliability theory, Muth (1977) restored a continuous probability distribution namely the 
Muth distribution (MD). However, until Jodra et al. (2015), this distribution has been neglected in literature. 
A continuous random variable Z is said to have MD with the parameter a, if its probability density function 
(pdf) is given by, 


f(z3a) = (e% - nero, >0,a€ (0,1). (1.1) 


The cumulative distribution function (cdf) of the MWD is obtained as, 


{az-2ie* 1} 
F(za)=l1-e *% ,z>0. (1.2) 


In order to demonstrate the significance of the model, the authors examined the scaled Muth distribution 
(SMD) and fitted it to the rainfall data set. They showed that the SMD is superior to well-known distributions 
such as Exponential, Gamma, Lognormal and Weibull. After the work of Jodra et al. (2015), the study of the 
SMD has gained a commendable position in the literature both from theoretical and applied perspectives. 
Jodra et al. (2017) studied the power version of the Muth random variable and highlighted its competence in 
the breaking stress of carbon fibres data and failure times of Kevlar 49/epoxy strands data. Inferential aspects 
of a geometric process with SMD are discussed by Bicer et al. (2021). Estimation of the scale parameter of 
MD and power MD are respectively discussed by Irshad et al. (2021) and Irshad et al. (2020). Meanwhile, a 
study on the scaled intermediate Muth distribution was carried out by Jodra and Arshad (2021). As a result 
of the commendable performance of MD and its variants, we considered a more generalized version of 
MD. Hence in this paper, we consider a more generalized form of MD by incorporating an additional shape 
parameter /, and name the distribution as the Exponentiated Muth distribution (EMD) in which is its special 
case. Compared to the MD, one of the appealing characteristics of the EMD is its flexibility in modelling all 
forms of the hazard rate function, which are quite common in lifetime data analysis and reliability studies. 
Moreover, the distribution under study is identified to be suitable for rainfall data, glass fiber and carbon 
fiber data. 
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The following is an order of organization for the remainder of the paper. The second section introduces 
the distribution and delineates some related functions like the survival function and the hazard rate function, 
as well as some of its important properties. Identifiability, moments, moment generating function (mgf) and 
various reliability measures including vitality function and mean residual life function (mrlf) are derived in 
Section 3. The uncertainty measures extropy and residual extropy is covered in Section 4. Among the topics 
discussed in Section 5, we discuss maximum likelihood estimation (MLE) of parameters, Fisher information 
matrix and asymptotic confidence interval. This section also describes the asymptotic behaviour of the EMD 
using several simulated data sets. MLE and Bayesian approaches are used to estimate unknown parameters 
of EMD. In Section 6, the proposed distribution is elucidated with three real data sets. Finally, the study is 
concluded in Section 7. 


2. The exponentiated Muth distribution 


The EMD is presented in this section, along with some of its properties. 


Definition 2.1. A continuous random variable Z is said to follow an EMD with parameters a € (0,1] and £ > 0, 
if its cdf is of the following form, for z= 0, 


F@)=(-dé@a)y, (2.1) 
where, 


az—l(e% -»} 


P(z;@) = A . (2.2) 


On differentiating (2.1), with respect to z, the pdf, f(z) of the EMD is obtained in the following form 
(for z > 0), 


Az) = Ble* — a) (za) (1 — ba), (2.3) 
where ¢(z;a) is given in (2.2). 


Special cases 
1. When f = | in (2.3), it will reduce to the pdf of the MD. 
2. When f= 1 and a — 0 in (2.3), it will reduce to a unit exponential distribution. 


Based on the pdf and cdf, the survival function F(z), hazard rate function /,; (z) and reversed hazard rate 
function t; (z) of the EMD is obtained as follows (for z > 0); 


F@)=1-(1- ¢(z:a)y, (2.4) 
z ett 2 -~éea)yr4 
a= IZ) _ BC a) G( ule . ;@)) (2.5) 
1 F(z) 1-(-¢(z;a@)) 
and 
a(Z)= f(z) Ble“ -a)b(z,a)-o(z,a))"* (2.6) 
F(z) (1—-9(z;a))? 


On differentiating (2.3) and (2.5) with respect to z, we have 


(2.7) 


f(2)= ro, (a) ACA)", ea 


A, (z3 a) WY (za) 
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and 


(2.8) 


hele) = byte] EM ea 


Aza) ¥,(z;a) 

= {ase} = {az-Lie*-p} 
where, A, (z;a) = (e* — a), A,(z;a) = ae”, Pa (z;a)=1-fer % and 'P, (z;a)=1-e * 
From (2.7) and (2.8), the EMD exhibits the following properties: 


Remark 2.1 From (2.7), one can infer that fz) is an increasing (or decreasing) function of z if 
A,(z;a) Y, (z/a) is greater than (or less than) Aj(z;a) ¥  (2;@). 


Remark 2.2 From (2.8), one can infer that h,{z) is an increasing (or decreasing) function of z if 
A,(z;a) Y, (z;a) is greater than (or less than) Ay(z;a). 


Remark 2.3 By using the definition of log-concavity, we can infer that the EMD with cdf given in (2.1) is 
unimodal iff, 


2 
d og £®) <9, 


which implies that, 
(B - 1)A}(z;a)(z;a) {Ay (esa) ¥; (z:a) — 4} a) ¥ (Z:0)} 


<W/ (za) A, (z;a){a2 +22 (z;0)}, 


Lae a 
—— az——(e*? -1) —— az——(e* —l) 
where, A, (z;a) = (e”— a), A, (za) = ae”, P, (za) = 1 - i 7 I WY, (za) =1- a e and 


wee Cane it 
Q(z;a) = A a‘ 9 


A series expansion of the cdf of the is given by the following theorem: 


Theorem 2.1 The cdf, F(z) of the EMD can be expressed as, 


6 (-l)‘(B+1-h), i 
- > : 2; 
F(z) a a (-F(z;@))", (2.9) 
where F(z,a) represents the cdf of MD and is given in (1.2). 
Proof: We have, 


(1+2)'"=Y2 (at 1 -M5 (2.10) 


for a € R and (z), = 2(z+ 1)---(¢+ k- 1) for k= 1 with (z)) = 1. 
Applying (2.10) in (2.1), the proof is immediate. 


Corollary 2.1 By differentiating (2.9), a series expansion of the EMD pdf of the can be obtained as, 


k(-l)"(B+1-k 
t= yh OE area fea, 2.11) 


where f(z;a) is the pdf of the MD and is given in (1.1). 

We may plot the pdf and hazard rate function of the EMD for different sets of parameters in Figures 1 
and 2, respectively, to have a better idea of how they look. Observing the figures, the following facts can be 
inferred regarding the shapes of the distribution, thus making the EMD a more flexible model than the MD 
depending on how its parameters are chosen, particularly /. 
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Figure 1: Different shapes of the pdf’s of the EMD. 
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Figure 2: Different shapes of the hazard rate function of the EMD. 
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1. For values of the parameters a and f that are relatively small, the EMD will be asymmetric or bimodal, 
whereas as P increases, the distribution approaches symmetry if a remains small. 


2. The hazard rate graphs for various combinations of parameters show various shapes, including 
increasing, increasing and stable, stable and increasing, decreasing and stable, constant, and bathtub 
(decreasing-stable-increasing). As a result of the shape parameter f, the hazard rate function has a great 
deal of flexibility, which makes it very suitable for non-monotone empirical hazard behaviours, which 
are more likely to be encountered in real-life scenarios. 

The quantile function of the EMD is given through the following theorem. 


Theorem 2.2 The quantile function &* (t) of the EMD is the solution of the equation, 


ac (t) ex" — 1) — log(1 — 77) = 0. (2.12) 


The proof is the immediate consequence of the definition of the quantile function rt = F(z), where F(z) is 
the cdf of EMD and is given by (2.1). 


Remark 2.4 The quantile function €*(t) of the EMD can also be written in terms of the qunatile function (-) 
of through the relation F(z) = (F(z;a))’, where F(z;a) and F(z) are respectively given by (1.2) and (2.1). That 
is &*(c) = Een) 


3. Properties 


It is the purpose of this section to provide a brief summary of some statistical and reliability properties of the 
EMD, such as identifiability, moments, mgf, vitality function and mrlf. 


3.1 Identifiability 


An important characteristic of a statistical model is its identifiability, which determines whether model 
parameters can be reconstructed from observed data. Then, different sets of parameters give different 
distributions for a given x. That is, the parameters should uniquely determine the distribution. Let /(©,) 
and /(@,) be different members of EMD indexed by ©, = (a, £,) and © = (a, f) respectively. Then the 
likelihood ratio, 


LE = FO,) 
S(O) 
(a -@2 je ae are D A\-1 
Ble“ -aje @  @ |e a 
7 5x nl 
ee 2* 1) 
B,(e*?* —a,)| l-e ty 
The logarithm of the likelihood ratio is, 
log L = log(Z,) — log(Z, ) + log(e*"* — a) —log(e*?* — a) 
ax Cae 
+ (@, —@,)x ee 
Qa, a, 
et) a a Gan) 


+(B,-Ylogli-e @ |-(f,-llogli-e @ 
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Taking the partial derivative of log L with respect to x and equating it to 0. That is, 
OlogL _ 


0 
Ox 
petel 
ax ax — { % 
ae +a,—-e%" (B, -)(@, Te Je 
lew" ~ a) geile 
—e a 
2X _y 
ae (B,-1Xa, ee | S | 
= Z +a,—-—e" — 2 2 
(-a,)” ay PD 
l-e 


That is, RHS = LHS iffa, =a, and 6, = £,. Therefore we conclude that the EMD is identifiable in parameters. 
That is, (©) =f(©>) Se 0, = O,. 


3.2 Moments 
Theorem 3.1 Jf Z has the EMD with the pdf given in (2.3), then the r" raw moment yt}, about the origin is 


given by, for r = 1,2,... 
MHD Ie CBC Sl (E41 
oy? COC-b Sf (R41 
oe (a(S 


[dosny'e e “dt is the generalized integro-exponential function (see, Milgram 
1 


(3.1) 


1 
r+) 


where, E/(Z)= 


(1985)), F(a) = [eretax is the gamma function and (x), =x(x + 1)...@+-1) for any k= 1 with (x)p = 1. 


0 


Proof. We have, oO 
“= I z' f (z)dz 
0 


= MR, (3.2) 


where, 


M, =B IO 2 e® [1 — eta YH ele =D) de (3.3) 


and 


‘00 f 1 iz f az: l Z. 
M, = 08 fp 7 [1—e a OH ela ob, (3.4) 
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From (3.3), 
65 CVG-D. (” 2° ee, 
=p), cy i sue a J logy ytrre te) dy Ce 
Bris ley, Ge CYB-b, © (e()} 
From (3.4), 


A CI a k), » (az-2(e D(H) 
=ap >) a ae ze dz 


= apy (-1" x kK), e a ion yyy Me eee Ie (3.6) 
— aprir+ De te: Hie De ze FE (2) 
a” k=0 k! a 


Substitute (3.5) and (3.6) in (3.2), we get (3.1). Hence the proof. 


3.3 Moment generating function 


Theorem 3.2 If Z has the EMD with the pdf given in (2.3), then the moment generating function of Z is 
given by, 


t 


pole Deca ae é.. Fei 
M(t) = ae r[arte, 


Cis aye +1 
RL On) 
B Wee k),e*a a t k+l 
-of) — rast, - } 
(k+l) % 


where, I'(a,b) = i t“! e* dt is the upper incomplete gamma function and (x), = x(x + 1)...(x + k— 1) for any 
k> 1 with (x) = 1. 


Proof. We have, 
Mz (t) = Ele] 
= (3.8) 
where, 


) 1 1 z 
BL =B hy? e® e@ [1 — ela CDH leg =D) dy (3.9) 


and 


By = oB Se e@ [1 — eta CDH cle =D) dy, (3.10) 
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From (3.9), 
B= Oe (— Ly" a" a Rae 
0 
k+l Rae 
gate k), OP Rie oy 
ae ae pee ee (3.11) 
(k+1) 
k+l kal 
=2> ai a ye Arete) 
(k+l) % 
From (3.10), 


B CY B-A)y, 1)‘ a k), = ge =D} (c+) +02 
SHED cs : 
A+d t 


1 k a pe et 
= aby (= Bo )e* “fr ody (3.12) 
(e+) ee 
p (I) Be ie t k+l 
=op)) = ra ‘ } 
(k+l) @ > 


Substituting (3.11) and (3.12) in (3.8), we get (3.7). Thus the theorem is proved. 


3.4 Reliability analysis 


Quantitative evaluation of the mature product at each stage of its life cycle is an important aspect of reliability 
analysis. It also plays a significant role in reliability engineering to assure customer satisfaction. In the 
following subsections, we obtain certain reliability properties of the EMD. 


3.4.1 Vitality function 

In modeling lifetime data, the vitality function is crucial. In engineering and biomedical science, the vitality 
function, along with mrlf, play a major role. If Z is a non-negative random variable having cdf F(z) with pdf 
J), the vitality function associated with the random variable Z is defined as, 


V(z) = E[Z|Z > z]. (3.13) 


In the reliability context, (3.13) can be interpreted as the average life span of components whose age 
exceeds z. Clearly the hazard rate reflects the risk of sudden death within a life span, where as the vitality 
function provides a more direct measure to describe the failure pattern in the sense that it is expressed in 
terms of an increased average life span. 


3.4.2 Mean residual life function 


Ifa unit survives an age z, the mrlf represents its remaining lifespan. It is the remaining lifetime of a unit from 
a certain point of time. The concise information by mrlf for establishing a warranty policy and for making 
maintenance decisions makes it an important measure in reliability modeling. If Z is a non-negative random 
variable having cdf F(z) with pdf f(z), then the mrlf is given by, 


m(z) = E[Z —2|Z > z]. (3.14) 


In the following theorem, we provide the vitality function and mrlf of the EMD. 


266 G Families of Probability Distributions: Theory and Practices 


Theorem 3.3 [fZ has the EMD with the pdf given in (2.3), then, 
1. The vitality function of Z, 


py I)'(G-, ee 
k! Gey 


V(z 
eS {1-[I- re ayy} 


{he <a) (#+2 2 Jor (en, 2) — 
rn A a 3.15 


k+l 


B k k-1 aq az 
(lif), a 2e* a e* (k+l) 
ab DP kL (keh {he Sa} 5 


a 


where, I'(a,b) = i t“! e* dt is the upper incomplete gamma function and I''(a,b) = i t*' log(t) e“ dt is the 
first derivative of the upper incomplete gamma function. 


2. The mean residual life function of Z, 
m(z) = V(z) —z, (3.16) 
where V(z) is given in (3.15). 


Proof. a) We have, the vitality function given by, 


Vz) = E[Z|Z > z]= ee tf dt. (3.17) 
F@) 
Now, 
CyOdt=%=—), (3.18) 
where, 
= BL." te [1 — clara e-J elarz (e-D) dy (3.19) 
and 
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From (3.19), 


A+d 


B (- 1 co CYB -!) ake % a ects (= k+l -y 
Vi= az,,,,l0og| —— *d 
ae (eel? é —— g k+l y © y 


A+. 


1 k i ; 
=py" C y B- de = feof naar 


k! Gal” 


+ le aay 08(¥) ye" dy —log(k + Df i eras} 


a 


k+l 


6 (-l)*'(B-k), ake® a e“ (k+l) 
a Po eae Es floal rhe a ) 


a 
k+l 


6 (1)'(G-h, atte? (° em ee 
=oBD KO (ke Jerusn'08 par o 


k+l 


8 (-1)'(B-k), a sp 
py an aera ee “aan? ay 


= ite log (y) yle*'dy—log(k + ee vera 


a 


From (3.20), 
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Substitute (3.21) and (3.22) in (3.18) and using equation (3.17), we get (3.15). Hence the proof. 


b) We have, the mrlf, given by, 
m(z) = E[Z—2z|Z > z] 
=E[Z|Z > z]-—z 
=V(z) -z, 


which is immediate from (3.15). 


4. Uncertainty measures 


(3.21) 


(3.22) 


(3.23) 


In this section, we derive two recently developed uncertainty measures of EMD in two distinct contexts. 
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4.1 Extropy 


The complementary dual of Shannon entropy, extropy, has a number of exciting applications, including the 
theory of proper scoring systems for alternate forecast distributions (see, Lad et al. (2015)). Entropy and 
extropy, two information metrics, are inextricably linked. For a random variable Z, its extropy is defined as 


ID) =-5 hy’ POQd. (4.1) 
4.2 Residual extropy 


The formulation of the concept of residual extropy was done by Qiu and Jia (2018) to measure the residual 
uncertainty of a random variable. For a random variable Z, its residual extropy is defined as (see, Qiu and 
Jia (2018)), 


ifj=— (4.2) 
2F 


More recently, Maya and Irshad (2019) developed some non-parametric estimators of the residual 
extropy function under the a mixing dependence condition. 
In the following theorems, respectively, given are the extropy and residual extropy functions of EMD. 


Theorem 4.1 The extropy function for the EMD has the following form: 


1ay~-E 5 eer bs a er r(a+4t2?) 


2 k=0 k! (ee2)"" a 
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Proof. We have, 
J(Z) =-= 5 I? @az. (4.4) 
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From (4.7), 
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Substitute (4.9), (4.10) and (4.11) in (4.4), we get (4.3). 
Thus the theorem is proved. 


Theorem 4.2 The residual extropy function for EMD the has the following form: 


oe -1 swe (-D'QB-1-k, a®? 
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The proof is similar to that of theorem 4.1, hence omitted. 


5. Estimation and inference 
5.1 Method of maximum likelihood estimation 


The most commonly used method of parameter estimation is the method of maximum likelihood. Its 
popularity is due to a number of desirable qualities, including consistency, asymptotic efficiency, invariance, 
and intuitive appeal. Let Z,, Z,,..., Z, be a random sample of size n from the EMD with unknown parameter 
vector 0 = [a,f]". Then the likelihood function of 6 is given by, 


Thier 
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(5.1) 


270 G Families of Probability Distributions: Theory and Practices 


The partial derivatives of log (@) with respect to the parameters are 
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The MLE of the parameters 0 = (a,/) say 6= (a, B) say is obtained by solving the equations Ge =0 
Clog! a 


and = 0. This can only be achieved by the numerical optimization technique such as the Newton- 


Raphson method and Fisher’s scoring algorithm using mathematical packages like R and Mathematica. By 
using any optimization method to compute @ we face the problem that (5.1) has more than one local maximum 
because the optimizer function with different initial values can lead to different maximums. To alleviate this 
problem first we plot the density function with some parameter values on the histogram of the data and find 
a good vector of parameters with a good fit to the histogram and choose this vector as the initial parameter 
vector to the optimization problem. 


5.1.1 Fisher information matrix 


In order to carry out statistical inference on the parameters of the EMD, we need to find the 2x2 expected 
Fisher information matrix /(@). The expected Fisher information matrix of is given by, 


Clog! 0° log! 
ES) =) 
0a dao p 
(0) = ; 5 . (5.4) 
Oo logl oO log! 
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oOpoa op 


The expected Fisher information matrix can be approximated by the observed Fisher information matrix 
J(@) given by, 


_@ log] 6° log! 


n da’ 0ao 
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That is, 
im FO)= TO). 
no nh 
For large n, the following approximation can be used, 
J() = nl(6). 


The elements of J(8) are given in Appendix. 
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5.1.2 Asymptotic confidence interval 


Here we present the asymptotic confidence interval for the parameters of the EMD. Let 6= (a, B ) be 
the MLE of 6 = (a,f). Under the usual regularity conditions and that the parameters are in the interior 
of the parameter space, but not on the boundary, we have Vn(6 — 6)-4 N,(0, I' (@)), where (A) is the 
expected Fisher information matrix. The asymptotic behavior is still valid if (0) is replaced by the observed 
Fisher information matrix J(@). The multivariate normal distribution, NV,(0, 7-' (@)) with mean vector 

= (0,0)' can be used to construct confidence intervals for the model parameters. The approximate 


100(1 — ¢)% two-sided confidence intervals for a and f are respectively given by @+Z evi ie (0) and 


Bt E Zs Tn (6), where I (6) and Ip (0) are diagonal elements of J~ (0) and Z4 is the upper *_ é percentile 
2 
ofa standart normal distribution. 


5.2 Bayesian analysis 


By the Bayesian approach, in this section, we are estimating the unknown parameters of the EMD. This 
approach can incorporate prior information about the problem at hand. Due to this fact, analysts consider 
this method as flexible to estimate parameters of a model. A statistical comparison between the maximum 
likelihood and Bayesian procedures for estimating parameters of the EMD is performed based on the real 
data sets in Section 6. 

Various types of priors are available in the Bayesian approach to handle various situations. Here we 
focus on the weakly informative prior (WIP). For the numerical integration, prior distributions that are not 
completely flat provide enough information for the numerical approximation algorithm to continue to explore 
the target posterior density. The half-Cauchy distribution (HCD) have such shapes. For the HCD, the mean 
and variance don’t exist, but its mode is equal to zero. When the parameter becomes 25 in the HCD, the density 
is partially flat. According to Gelman and Hill (2007), depending upon the information necessary, uniform 
or HCD is a better choice of prior. Assuming and a ~ Mo, 05) and 6 ~ HCD(a), where N and HCD stands 
for normal and half- Cauchy distributions respectively. The parameters of the prior density can be fixed as 
Mo = 0, of = 1 and a= 25. Suppose the parameters are apriori independent, posterior density of the parameters 
is given by, 


2 
a 


n [as:c1e"*-0] a Yt (2-7-0) TES. Sap 1 — 2*25 
n(6|z)« B"] [| 1-e ea []¢ "—a)——e a, (56) 
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Since (5.6) is not in closed form, one may use numerical integration or MCMC methods. By ‘Laplace 
Approximation’ method, any Bayesian model can be fit for which likelihood and priors are specified 
(see, Khan et al. (2016)). By using R software, we can solve this problem by performing 100000 iterations 
and the Random walk Metropolis algorithm (RwM). For the three data sets, the posterior mode and posterior 
median estimates are computed as Bayesian estimates and compared with corresponding MLE’s which is 
given in Table 1. It can be concluded that approximately equal estimates are obtained by both the MLE and 
Bayesian approaches. 


Table 1: Comparison between MLE and Bayesian estimates. 


Data set Parameter MLE Posterior median Posterior mode 
a. 1 1 0.98 
: B 0.46 0.38 0.44 
a. 0.94 0.93 0.92 
- Bp 4.07 3.95 4.14 
a. 0.53 0.52 0.52 
: B 14.14 13.61 13.88 
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5.3 Simulation 


Our simulation study is based on generating observations with R software using three combinations of 
parameters to study the asymptotic behaviour of the MLEs of the parameters of the EMD. To obtain a good 
estimate of an estimator’s variance, Efron and Tibshirani (1991) recommended a maximum of 200 bootstrap 
samples. From EMD, we generated 200 bootstrap samples of sizes 25, 50, 75, 100, 150, 250, 500, and 1000 
for different parameter combinations. Our results are shown in Table 2. From Table 2, one can infer those 


Table 2: The results of the simulation study. 


a=0.25, B=5 
Parameter n MLE Bias MSE CP AL 
25 0.2771 0.0271 0.0096 0.910 0.3669 
50 0.2694 0.0194 0.0045 0.935 0.2590 
75 0.2637 
100 0.2609 
: 150 0.2563 
250 0.2535 
500 0.2523 
1000 0.2484 
25 5.2977 
50 5.1528 
75 5.1734 
100 5.1314 0.1314 0.2855 0.975 2.0178 
B 150 5.1067 0.1067 0.1567 0.970 1.6383 
250 5.0149 0.0149 0.0786 0.980 1.2451 
500 5.0144 0.0144 0.0461 0.955 0.8801 
1000 5.0072 0.0072 0.0272 0.960 0.6213 
a=0.5, B=3 
Parameter n MLE Bias MSE CP AL 
25 0.5285 0.0285 0.0137 0.915 0.4360 
50 0.5219 0.0219 0.0063 0.935 0.3077 
75 0.5168 
100 0.5125 
° 150 0.5071 
250 0.5042 
500 0.5027 
1000 0.4981 
25 3.1617 
50 3.0872 
75 3.0940 
100 3.0724 0.0724 0.1009 0.980 1.2062 
i 150 3.0602 0.0602 0.0549 0.975 0.9805 
250 3.0069 0.0069 0.0283 0.980 0.7461 
500 3.0073 0.0073 0.0165 0.955 0.5276 
1000 3.0050 0.0050 0.0098 0.950 0.3728 


Table 2 contd. ... 
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... Table 2 contd. 
a=0.7,R=4 
Parameter n MLE Bias MSE CP AL 
25 0.7733 0.0233 0.0108 0.920 0.4051 
50 0.7697 0.0197 0.0053 0.925 0.2853 
75 0.7639 0.0139 0.0036 0.955 0.2321 
100 0.7612 0.0112 0.0026 0.955 0.2012 
7 150 0.7563 
250 0.7543 
500 0.7530 
1000 0.7483 
25 4.2948 
50 4.1771 
75 4.1670 
100 4.1283 
B 150 4.0995 
250 4.0208 0.0208 0.0546 0.980 1.0279 
500 4.0174 0.0174 0.0315 0.960 0.7256 
1000 4.0036 0.0036 0.0183 0.960 0.5107 


estimates are quite stable and more precisely, are close to the true parameter values for these sample sizes 
while the MSEs of the estimators are in decreasing order. In addition, for each parameter, the coverage 
probabilities (CP) are fairly close to the 95% nominal level. As increases, the average lengths (ALs) for each 
parameter decrease to zero. 


6. Application 


In this section, three data sets are used to exemplify the excellence of the EMD (EMD(a,f)). The first data 
set considered here was taken from the website of the Bureue of Meteorology of the Australian Government. 
The data comprises total monthly rainfall (in mm) collected from January 2000 to February 2007 in the rain 
gauge station of Carrol, located in the state of New South Wales on the east coast of Australia (see, Jodra 
et al. (2015)). The data set is listed below. 


12.0, 22.7, 75.5, 28.6, 65.8, 39.4, 33.1, 84.0, 41.6, 62.3,52.5, 13.9, 15.4, 31.9, 32.5, 37.7, 9.5, 49.9, 31.8, 32.2, 
50.2, 55.8, 20.4, 5.9, 10.1, 44.5, 19.7, 6.4, 29.2, 42.5, 19.4, 23.8, 55.2, 7.7, 0.8, 6.7, 4.8, 73.8, 5.1, 7.6, 25.7, 
50.7, 59.7, 57.2, 29.7, 32.0, 24.5, 71.6, 15.0, 17.7, 8.2, 23.8, 46.3, 36.5, 55.2, 37.2, 33.9, 53.9, 51.6, 17.3, 
85.7, 6.6, 4.7, 1.8, 98.7, 62.8, 59.0, 76.1, 67.9, 73.7, 27.2, 39.5, 6.9, 14.0, 3.0, 41.6, 49.5, 11.2, 17.9, 12.7, 
0.8, 21.1, 24.5. 


The second data set is the strength of glass fibers of length 1.5 cm from the National Physical Laboratory 
in England (see, Smith and Naylor (1987)). The data set is listed below. 


0.55, 0.93, 1.25, 1.36, 1.49, 1.52, 1.58, 1.61, 1.64, 1.68, 1.73, 1.81, 2.00, 0.74, 1.04, 1.27, 1.39, 1.49, 1.53, 
1.59, 1.61, 1.66, 1.68, 1.76, 1.82, 2.01, 0.77, 1.11, 1.28, 1.42, 1.50, 1.54, 1.60, 1.62, 1.66, 1.69, 1.76, 1.84, 
2.24, 0.81, 1.13, 1.29, 1.48, 1.50, 1.55, 1.61, 1.62, 1.66, 1.70, 1.77, 1.84, 0.84, 1.24, 1.30, 1.48, 1.51, 1.55, 
1.61, 1.63, 1.67, 1.70, 1.78, 1.89. 
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The third data presented here is originally proposed by Bader and Priest (1982) on failure stresses of 
single carbon fibers of length 50mm. The data set is listed below. 


1.339, 1.434, 1.549, 1.574, 1.589, 1.613, 1.746, 1.753, 1.764, 1.807, 1.812, 1.84, 1.852, 1.852, 1.862,1.864, 
1.931, 1.952, 1.974, 2.019, 2.051, 2.055, 2.058, 2.088, 2.125, 2.162, 2.171, 2.172, 2.18, 2.194, 2.211, 2.27, 
2.272, 2.28, 2.299, 2.308, 2.335, 2.349, 2.356, 2.386, 2.39, 2.41, 2.43, 2.431, 2.458, 2.471, 2.497, 2.514, 
2.558, 2.577, 2.593, 2.601, 2.604, 2.62, 2.633, 2.67, 2.682, 2.699, 2.705, 2.735, 2.785, 3.02, 3.042, 3.116, 
3.174. 


We shall compare the fits of Muth distribution (MD(a)) with some competitive models such as 
Exponential distribution (ED(A)), generalized exponential distribution (GED(a,A), lognormal distribution 
(LND(u,0)) scaled Muth distribution (SMD(a,f)) and gamma distribution (GD(a, b)). For the above three data 
sets, we have computed MLEs of the parameters and calculated AIC (Akaike information criterion), AICc 
(Akaike information criterion corrected), CAIC (Consistent Akaike information criterion), BIC (Bayesian 
information criterion). All these criteria are calculated using R software. Each of these criteria takes into 
account the likelihood, the number of observations, and the number of parameters. The numerical values of 
AIC, AICc, CAIC and BIC of all fitted models for the above 3 data sets are respectively given in Tables 3-5. 


Table 3: MLEs, log-likelihood, AIC, AICc, CAIC and BIC for the first data set. 


Model MLE Loglikelihood AIC AICe CAIC BIC KS 
a= 
EMD(a.) = 43.3327 90.6654 90.8154 97.5031 95.5032 0.0445 
B=0.46 
MD(a) a=0.37 55.7681 113.5362 113.5856 116.9550 115.9550 0.2019 
a=0.46 
SMD(a.B) F068 43.4821 90.9642 91.1142 97.8019 95.8019 0.0570 
ED(A) 2=1.47 50.8281 103.6562 103.7056 107.0750 106.0750 0.1186 
a=1.51 
GED(a.)) ae 47.3560 98.7120 98.8620 105.5497 103.5497 0.0873 
a=1.51 
GD(a,b) an 46.9565 97.9129 98.8620 104.7507 102.7506 0.0837 
[=0.75 
LND(t,6) ans 56.8982 117.7964 117.9464 124.6340 122.6340 0.1217 
C=, 


Model Loglikelihood AIC AICc CAIC BIC KS 


EMD(a,f) —19.3903 42.7805 42.9805 49.0668 47.0669 0.2018 


MD (a) —58.7365 119.4730 119.5386 122.6161 121.6161 0.5357 


SMD(a,f) 39.3860 82.7720 82.9720 89.0583 87.0583 0.3265 


ED() —88.8303 179.6606 179.7262 182.7262 181.8037 0.4160 


GED(a,) eer —31.3835 66.7669 66.9669 73.0532 71.0533 0.2282 
A=z. 
a=17.44 

GD(a,b) ais —23.9515 51.9031 52.1031 58.1894 56.1893 0.2158 
bo rs 


1=0.38 
LND(u,0) 2 ~28.0049 60.0099 60.2099 66.2961 64.2961 0.2328 
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Table 5: MLEs, log-likelihood, AIC, AICc, CAIC and BIC for the third data set. 


Loglikelihood AIC AICe CAIC BIC KS 


34.9967 73.9934 74.1869 80.3421 78.3421 0.0723 


—134.7635 271.5270 271.5905 274.7014 273.7014 0.7477 


SMD(a,f) 71.96 64.6493 133.2985 133.492 139.6473 137.6474 0.3294 


ED() 7=0.45 —117.5382 237.0765 237.14 240.2509 239.2508 0.4678 


a=176.35 
GED(a,A) —38.3621 80.7242 80.9178 85.0730 85.0730 0.0983 


GD(a,b) 7 35.0701 74.1402 74.3337 80.4890 78.4889 0.0724 


LND(u,0) T —35.7676 75.5352 75.7287 81.8839 79.8840 0.0838 


0.8 1.0 
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Figure 3: Estimated pdf and cdf plots of the fitted distribution of the rainfall data. 


From each table it can be observed that EMD has the smallest value of AIC, AICc, CAIC and BIC, thus one 
can conclude that EMD has a better performance compared to the other competing models. Further, we have 
conducted the Kolmogorov Smimov (KS) test to check the goodness of fit for all the data sets of the EMD 
as well as the other models. The value of the KS statistic is also included in the final columns of Tables 3-5. 

The result of this study shows that EMD has high fitting ability compared to all other models. Once again 
the promising performance of the proposed distribution is visible from Figure 3, Figure 4 and Figure 5. 

To test the null hypothesis Hj:MD versus H,:EMD or equivalently Hy:6 = 1 versus H,:6 £ 1, we use the 
likelihood ratio (LR) test for each dataset. Table 6 includes the LR statistics and corresponding p-values for 


Table 6: LR statistics and their -values on data sets 1, 2 and 3. 


Data set LR p-value 
1 24.8708 0.0000006 
2 78.6920 <2.2x1071¢ 


3 199.5300 <2.2 x 10°'° 
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Figure 5: Estimated pdf and cdf plots of the fitted distribution of the carbon fibres data. 


all the three data sets. Given the values of the test statistics and their associated p-values, we reject the null 
hypotheses for all data sets and conclude that the EMD model provides a significantly better representation 
of the distribution of these data sets than the MD. 


7. Concluding remarks 


The baseline model under-considered here was first coined by French biologist Teisser (1934). He introduced 
this model in order to study the mortality of animal species dying out of pure ageing, That is, not from 
accidents or disease. A modified version of the Teissier model was obtained and studied by Laurent (1975). 
Muth (1977) examined this model and found that it exhibited a heavier tail than the commonly used lifetime 
distributions like gamma, lognormal and Weibull. Using this model, Rinne (1981) estimated a German 
data set based on prices of used cars the lifetime distribution (with lifetime expressed in kilometres). 
Afterwards, Jodra et al. (2015) termed it the Muth distribution and explored its significant properties through 
the Lambert W function. Through this study we introduced a more flexible form of the Muth distribution 
namely exponentiated Muth distribution by introducing an additional shape parameter /. Certain important 
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distributional properties such as closed form expression for moments and moment generating function as well 
as analytical expression for some reliability measures such as vitality function and mrlf are obtained. Closed- 
form expressions for two recently developed uncertainty measures extropy and residual extropy are provided. 
Estimation of model parameters was established based on the maximum likelihood method and examined by 
simulation studies. For the three data sets, the posterior mode and posterior median estimates are computed 
as Bayesian estimates and compared with corresponding MLEs. Bayesian estimates are approximately equal 
to the MLEs. The dominance of the proposed model has been illustrated using certain real data sets where 
the LR test is also performed and it is concluded that it can be considered a good candidate for reliability 
analysis. 
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Chapter 17 


Exponentiated Discrete Modified Lindley 
Distribution and its Applications in the 
Healthcare Sector 


Lishamol Tomy,' Veena G?* and Christophe Chesneau' 


1. Introduction 


Statistical distributions that lie in the range [0, 0] are widely used to explain and study real world data. 
Numerous traditional distributions and their generalizations have been widely used for studying data from a 
wide range of sectors in recent years, which include engineering, medical science, finance, human physiology, 
wind power, and reliability analysis. 

Lifetimes must be quantified on a discrete basis instead of a continuous scale for certain scenarios. Such 
scenarios include the number of women who are working on shells for 5 weeks; the survival times in months 
of individuals infected with virus, are a few among many. Recent works in the area include, those of Eldeeb 
et al. (2021), Eliwa and El-Morshedy (2021), and Almetwally and Ibrahim (2020). Considering the numerous 
discrete distributions used in research, one may develop novel discrete distributions with different properties 
that appear to be suitable for a wide range of applications. 

One such discrete distribution is the D-ML distribution introduced by Tomy et al. (2021). It is developed 
by discretizing the Modified Lindley distribution studied by Chesneau et al. (2021) via the survival 
discretization scheme. The distribution function (df) and the corresponding probability mass function (pmf) 
of the D-ML distribution can be expressed as follows: 


loga 


2(y+l. 
? 


W(y3a) =1-a?"' + (y+Da yeN,, 


1-loga 
and 


y 


x[(-loga)(1—@)—a loga(y—a?(y + D)sy EN). 


w(y3a) = (tone 


respectively, where N, is the set of whole numbers and a € (0,1). One among the vital properties of this newly 
evolved model is that it can model not only positively skewed data sets, but it can also be utilized for modelling 
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increasing, decreasing or unimodal failure rates. It can be noted that the distribution is efficient in modelling 
data compared to the discrete Lindley distribution, and other discrete compound Poisson distributions. 

In this paper, we establish a novel count distribution with two parameters a and f, called the exponentiated 
discrete Modified Lindley (E-DML) distribution. The following are some of the features of this distribution: 
Both the reliability function (rf) and the hazard rate function (hrf) have closed forms. Furthermore, because 
its hrf may take on a variety of decreasing forms, the basic distribution’s parameters can be changed to fit 
count data sets. Finally, the proposed E-DML distribution matches the count data the best, despite the fact 
that it only has two parameters. The E-DML distribution, we feel, is suitable for attracting a wide range of 
applications in disciplines such as medical, technology, and others. 

The rest of the paper is laid out as follows: The origins of the E-DML distribution concept are discussed 
in Section 2. The moments associated with the E-DML distribution are studied in Section 3. The order 
statistics of the distribution, as well as the L-moment statistic, are investigated in Section 4. The maximum 
likelihood estimation technique is used to estimate the model parameters in Section 5. Section 6 discusses 
specific data set uses. Finally, the results are presented in Section 7. 


2. The E-DML distribution 


One of the most popular schemes used in generalizing distributions is the exponentiated-W with, df, G (y; a), 
when it comes to modelling lifetime data. Applying this approach, for / > 0, the df of exponentiated-G class 
of distributions is given by, 


F (y; a, B)=[G Oy a). 


For a detailed review on the exponentiated-G technique, we redirect the readers to Lehmann (1952). 
A random variable (rv) Y is said to have the E-DML distribution with parameters a and f, if the df and 
pmf of the E-DML distribution are given by, 


A(y+la,f) 


G y;a, P)= > 
e-puQ B) (1-loga)? 


and 


&r-put V6B) = + 1; a, B)-AGQ;a,f)], 


(l-loga)’ [Aw 


respectively, where y € N, and A(y; a8) = [(1 — loga)(1 — @’) + loga(ya?”))’. 
Figure | illustrates the plots of g¢ pjy, (vy & f) for various values of parameters a and f. 
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Figure 1: Shapes of gp pj, (vy & f) for selected values of a and f. 


From the plots in Figure 1, as the pmf of the E-DML distribution is log-concave, thus, it can be deduced 
that the distribution is always unimodal. The distribution also has a long right tail for some values of a and f. 
The hrf can be defined as follows: 


Se pm (%A,8) A(yt+la,f)-AQ34,£) . 
Rep (a,8) (1-loga)’ -A(y+ la, B)’ 


(1-loga)’ -A(y +a, £) 


hy py (V3, B) = 


where, Re_py (Via, 8) = 


(1-loga)’ 
In addition, the reversed hazard rate function (rhrf) is formulated as: 
A(y;a, 8) 
r 3a, B) =1- ; 
z-pui(V3@, B) AGpelie#) 0 


Figure 2 shows the rhrf plots for varying parameter values of the E-DML distribution. 
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Figure 2: Shapes of the rhrf of the E-DML distribution for selected values of the parameters. 


From Figure 2, we can observe that the rhrf is decreasing and constant in y for some values of the 
parameters. 


3. Moments 


In this section, the moments of the E-DML distribution are developed. Thanks to them, one can analyze the 
mean, distributional spread, symmetry, and peakedness of the distribution by determining its moments. 
The r” moment of a rv Y following the E-DML distribution denoted by 1),, is given as follows: 


1 
Hh, = BY’) = 2-0" Se-pmut 04,8) = (toga)? Y5-0” [AQ + 1; a B)-AG; a, B)] (1) 
We recall that the reliability function of Y is defined by, 


= B_ : 
hae log a) Ay +ha,B) 


(l-loga)’ 
The reliability function, 1,.as given in Equation (1), can be defined as follows: 
2 — Be cfs ciee a a oe o . 4 
= dp OD Reva. B)= 7 > _.b” -@- Dd -log a)” -A(y+ a, 8). 2) 


From Equation (2), the mean (yw) and variance (w,) of the E-DML distribution are as follows: 
1 00 
= —_____ 1-loga)’ —A(yv+ha, 
Pe Dd -logay’ — A(y + sar, 
and 


1 io) 2 
My =F rtogayF Dayal? “HL log a? -A(y +a, By]—n’. 


The w and uw, of the E-DML distribution can be calculated numerically since a closed form of the r* 
moment cannot be found. The metrics of central tendency and dispersion can be mathematically analysed 
using statistical tools. 

Tables 1—2 list w and wv, of the E-DML distribution as the parameters a@ and f are allowed to vary. 
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Table 1: Values of « of the E-DML distribution for various values of a and f. 


Bl a- 0.1 0.2 0.3 0.4 0.5 0.6 
2 0.2249 0.5012 0.8432 1.282 1.8716 2.6770 
3 0.3212 0.6848 1.1096 1.638 2.335 3.265 
4 0.4083 0.8363 1.315 1.904 2.677 3.689 
5 0.3212 0.9627 1.480 2.114 2.947 4.016 

Table 2: Values of “7 of the E-DML distribution for various values of a and f. 

Bl a- 0.1 0.2 0.3 0.4 0.5 0.6 
2 0.2240 0.5051 0.8902 1.4771 2.413 3.790 
3 0.2923 0.5929 0.9826 1.587 2.554 3.880 
4 0.3402 0.6324 1.016 1.638 2.625 3.881 
5 0.2923 0.6465 1.030 1.673 2.669 3.847 


We formulate the following remarks: 


o Fora fixed f, uw and yw, increase as a > 1. 


o The E-DML distribution is well-suited to model data that is either over or under-dispersed 
underlying distribution’s parameters can be altered to fit most of the data sets. 


The skewness (Sk) and kurtosis (Kurt) can be numerically attained as follows, 


and 


Tables 3-4 report the Sk and Kurt of the E-DML distribution. 


Table 3: Sk of the E-DML distribution for various values of a and f. 


Sk = (43 — 3% Mt AYP Vee? 


Kurt = (4 — Aig Hy — 3th? + 12h gery? — 61, "/uD”. 


Bl a 0.1 0.2 0.3 0.4 0.5 0.6 
2 2.150 1.587 1.453 1.422 1.329 1.100 
3 1.606 1.224 1.240 1.293 1.212 0.978 
4 1.2711 1.047 1.1179 1.254 1.150 0.910 
5 1.6065 0.9683 1.1751 1.232 1.102 0.867 

Table 4: Kurt of the E-DML distribution for various values of a and f. 

Bl a 0.1 0.2 0.3 0.4 0.5 0.6 
2 8.038 6.426 6.374 6.278 5.519 4.211 
3 5.646 5.346 5.827 5.856 5.054 3.793 
4 4.584 5.068 5.736 5.670 4.772 3,352 
5 5.646 5.076 5.724 5.521 4.565 3.395 
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The following observations are made: 
o The E-DML distribution has longer right tails for varying values of a and /. 
o The Sk and Kurt of the distribution, decrease as 8 — for a fixed value of a. 


Following the E-DML distribution, the probability generating function of Y is defined by, 
oy (q) = Eq") = dX -0 9 Se-put (Vf) 


lat '[-loga)’ -A(y +a, A)], 


where, g € (—1,1). With the help of w,(q), the mean can be obtained by differentiating w,(q) once at g = 1, 
and the variance by using the following formula: (d’@ y(q)/dq’) |,=) + (dwy (q)/dq)|,=1 — (doy (q)/dq)| 4-1. 


4. Order statistics and L-moment statistics 


Suppose Y,, Y3..., ¥, be a random sample from the E-DML distribution, let Y;.,, Y>.,,...., Y,,., be the relevant 


7" An 
order statistics. Then, the df of Y,.,, can thus be represented as follows assuming an integer value of y: 


= m 
Grn W501, 8) = >( k 0 (3,2, IV 0-Ge_pe se. A” 
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is given as follows: 


Furthermore, the pmf of Y,.,, 


nm n—m) A(y tla, Bint j)-AQsa, Bm + J) 
Gn ieB)= Od CD! »(* ie | (lopayore 


and the /“ moment of Y,.,, is expressed as follows: 


n-m j n n-m ,A(v+hba, B(m+ j))- A(y3a, (m+ jf) 
roc) Ee ET eu (yy eee 


To summarise, theoretical distribution and actual samples, Hosking (1990) developed L-moments (LM). 
Additionally, it was also demonstrated that LM are a suitable indicator of the distribution’s shape and can 
be used to fit distributions to data. These are the mean of certain mixtures of Y,.,. The LM statistics of Y is 


given as, a 
5-2 en cam 


As LM statistics of Y were defined to be quantities, we may introduce some basic statistical metrics 
related to LM statistics for the E-DML distribution. These include the LM(u) = 6,, LM of coefficient of 
variation = 0,/d,, LM coefficient of Sk = 6,/d, and LM coefficient of Kurt = 6,/6, can be formulated. 


5. Estimation methods 


We discuss the method of maximum likelihood estimation (MLE) in this section, to estimate the parameter 
vector, 0 = (a, /) of the E-DML model. Assuming yj, yo, ..., y,, are random values from a sample of the E-DML 
distribution. The log-likelihood function (LL) can be expressed as, 


LL (y; a, 6) = —nf log(1 — loga) + ¥'7-) log[AGQ; + 14,8) — Aah). (4) 
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By means of the derivative of Equation (4) with respect to the parameters a and /, we obtain the normal 
nonlinear likelihood equations, which are as follows, 


7b, >» WO +A Wi + 4) OVW 4) _ 6) 
a(I—log a) - A(y, +154, B)-A(y;34, B) 


and 
n A(y, +1a, B)logW,(y, +1; a)-A(y,3&, B)logW, (y,;) 
- A(y, +14, B)-AQ;4, B) 


respectively, where W, (v,; &) = [log(va?"! + (1 — &) (1 — loga))] and W, (v; &) =v"! (1 — loga — (1 — @"") 
+(2y + 1)/4). ; : 

The solutions of likelihood Equations (5) and (6) provide the MLEs of 0 = (a, f)’, say O= (4, B)’. This 
can be calculated using a mathematical approach like the Newton-Raphson method. 

The E-DML distribution may be shown to satisfy the regularity criteria, which are satisfied by 0 in their 
parameter space, but not on the boundary, as seen in Coy and Hinkley (1979). As a result, the MLE vector, 4 
is stable and tends to a normal distribution as 7 tends to ©. 


—n log(-loga)+ >) =0 (6) 


6. Data analysis 
The relevance of the E-DML distribution over other competitive distributions is highlighted in this section 
using a Healthcare data set. As an initial step towards data modelling, 

¢ The MLEs and standard errors (se), are computed using the maxlogL function in R software. 


* Model comparisons are made with the help of Akaike Information Criterion (AIC), Correct Akaike 
Information Criterion (AICC), chi-square (y”) and its p-value , computed in R. 


¢ A visual representation of the data set with the estimated pmf is made. 


The competitor models used to compare the E-DML distribution in this study are presented in Table 5 


Table 5: Models competing against the E-DML distribution. 


Models Abbreviations References 
Discrete Inverse Rayleigh DLR Hussain and Ahmad (2014) 
Discrete Rayleigh DR Roy (2004) 
Poisson Poi Poisson (1837) 
Discrete Inverse Weibull DIW Jazi et al. (2010) 
One parameter Discrete Lindley DLi-I Gomez-Déniz and Calderin-Ojeda (2011) 
Two parameter Discrete Lindley DLi-II Bakouch et al. (2014) 
Three parameter Discrete Lindley DLi-I Eliwa et al. (2020) 


The distribution having the highest p-value and lowest values of AIC and AICC is said to have the best 
fit compared to the other models. 

The data comes from a study by Chan et al. (2010), who looked into the influence of a corticosteroid on 
cyst formation in mouse foetuses at University College London’s Institute of Child Health. The kidneys of 
embryonic mice were cultivated, and a random sample was given steroids. Table 6 shows the number of cysts 
found in the kidneys after administering steroids. 


Table 6: Number of cysts in kidneys. 


Number 0 1 2 3 4/5]6]7 8 9 10 | 11 
Observed Frequency | 65 | 14 | 10 | 6 4};2]2]2 1 1 1 2 
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Table 7: MLE for number of cysts in kidneys 


MLE (se) 
6=0.554 (0.049) 
0.901 (0.009) 


D> 
ll 


Poi | =1.390 (0.112) 
DIW | d=1.049 (0.146); 

=0.581 (0.048) 
DLi-l | —_ 4=0.436 (0.026) 
DLi-Il | &=0.581 (0.045); 


f=0.001 (0.058) 


DLi-III 4=0.582 (0.005); 
P=358.728 (11863.37); 
=0.001 (20.698) 


E-DML | 4=0.775 (0.042); 


=3 
6 


f=0.232 (0.051) 


Table 8: Goodness-of-fit metrics for the number of cysts in kidneys. 


Model -L AIC AICC x2 p-value 
DIR 186.547 375.094 375.131 40.456 <0.001 
DR 277.778 557.556 557.593 306.515 <0.001 
Poi 246.210 494.420 494.457 89.277 <0.001 
DIW 172.935 349.869 349.982 6.445 0.092 

DLi-I 189.110 380.220 380.257 34.635 <0.001 

DLi-II 178.767 361.534 361.646 19.091 0.0003 

DLi-III 178.767 363.533 363.759 19.096 <0.0001 

E-DML 167.192 338.385 338.550 6.398 0.7807 


The MLE and se of the E-DML distribution and other competing distributions are reported in Table 7. 

The goodness-of-fit metrics of the number of cysts in kidneys is reported in Table 8, using AIC, AICC, 
x’ and p-value. 

From Table 8, we can see that the E-DML distribution has the highest p-value of 0.7807, with smaller 
values of AIC being 338.385, AICC being 338.550 and ¥’ value being 6.398. The histogram plot of the 
estimated pmf of the count data set of cysts in kidneys for the E-DML distribution is shown in Figure 3. 
Figure 3 illustrates the plot of the estimated pmf, which supports the result in Table 8. 


Histogram and estimated pmf 


a — E-OML 


Figure 3: Histogram and estimated pmf of the number of cysts in kidneys using steroids. 
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7. Conclusion 


A discrete modified Lindley-based two-parameter exponentiated model is proposed. In modelling data from 
many fields, it has been discovered that the new distribution is more versatile, has a longer right tail, and has 
a simpler shape than the parent distribution and the competitive distributions featured in the paper. There has 
been discussion of various distributional features. L-moments and order statistics are also investigated. The 
parameters are estimated using maximum likelihood estimation, and the distribution’s relevance is compared 
to that of competing models using the goodness-of-fit approach. The importance is determined by the number 
of cysts found in the kidney. We discovered that the newly designed distribution is useful in data modelling 
and, unlike its counterpart models, could be used to model data from the medical sector. 
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Chapter 13 


Length Biased Weighted New Quasi 
Lindley Distribution 


Statistical Properties and Applications 
Aafaq A Rather 


1. Introduction 


The study of weighted distributions is useful in distribution theory because it provides a new understanding 
of the existing standard probability distributions and methods for extending existing standard probability 
distributions for modeling lifetime data due to the introduction of additional parameters in the model which 
creates flexibility in them. Weighted distributions occur in modeling clustered sampling, heterogeneity, and 
extraneous variation in the dataset. The concept of weighted distributions was first introduced by Fisher 
(1934) to model ascertainment biases which were later formalized by Rao (1965) in a unifying theory for 
problems where the observations fell in a non-experimental, non-replicated and non-random manner. When 
observations are recorded by an investigator in nature according to certain stochastic models, the distribution 
of the recorded observations will not have the original distribution unless every observation is given an equal 
chance of being recorded. Weighted models were formulated in such situations to record the observations 
according to some weighted function. The weighted distribution reduces to a length biased distribution when 
the weight function considers only the length of the units. The concept of length biased sampling was first 
introduced by Cox (1969) and Zelen (1974). Warren (1975) was the first to apply the size biased distributions 
in connection with sampling wood cells. Patil and Rao (1978) studied weighted distributions and size biased 
sampling with applications to wildlife populations and human families. Van Deusen (1986) arrived at a size 
biased distribution theory independently and applied it in fitting assumed distributions to data arising from 
horizontal point sampling. More generally, when the sampling mechanism selects units with a probability 
proportional to some measure of unit size, the resulting distribution is called size-biased. There are various 
good sources which provide a detailed description of weighted distributions. Different authors have reviewed 
and studied the various weighted probability models and illustrated their applications in different fields. 
Weighted distributions are applied in various research areas related to reliability, biomedicine, ecology and 
branching processes. Afaq et al. (2016) have obtained the length biased weighted version of the Lomax 
distribution with properties and applications. Reyad et al. (2017) obtained the length biased weighted Frechet 
distribution with properties and estimation. Mudasir and Ahmad (2018) discussed the characterization and 
estimation of the length biased Nakagami distribution. Para and Jan (2018) introduced the Weighted Pareto 
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type II Distribution as a new model for handling medical science data and studied its statistical properties 
and applications. Rather and Subramanian (2019) obtained the length biased erlang truncated exponential 
distribution with lifetime data. Rather and Ozel (2020) discussed the weighted power Lindley distribution 
with application on real life time data. Hassan, Dar, Peer and Para (2019) obtained the weighted version of the 
Pranav distribution with real life data. Hassan, Wani and Para (2018) discussed the weighted three parameter 
quasi Lindley distribution with properties and applications. Ganaie, Rajagopalan and Rather (2019) discussed 
the length biased Aradhana distribution with applications. Recently, Ganaie, Rajagopalan and Rather (2020) 
discussed the weighted two parameter quasi Shanker distribution with its properties and applications, which 
shows more reliability and efficiency than the classical distribution. 

Shanker and Ghebretsadik (2013) introduced a new quasi Lindley distribution, a newly proposed 
two parametric probability distribution and derived its various mathematical and statistical properties as 
moments, skewness, kurtosis, failure rate and mean residual life functions and stochastic ordering. It is 
observed that the expressions for the failure rate and mean residual life functions and stochastic ordering of 
the new quasi Lindley distribution show their flexibility over the Lindley and Exponential distributions and 
the quasi Lindley distribution. Also, the new quasi Lindley distribution is a particular case of a one parameter 
Lindley distribution. The parameter estimation is also discussed by using the methods of moments and the 
maximum likelihood estimation. The goodness of fit of the new quasi Lindley distribution has been fitted to 
a number of data sets related to survival times, grouped mortality data and waiting times to test its goodness 
of fit and it is observed that the new quasi Lindley distribution provides a closer fit than those of the Lindley 
and quasi Lindley distribution. 


2. Length biased weighted new quasi lindley (LBWNQL) distribution 


The probability density function of the new quasi Lindley distribution is given by, 
2 


f(x%30,@) = (O+axje";x>0,0>0,a<-0* (1) 


+a 
and the cumulative distribution function of the new quasi Lindley distribution is given by, 


2 
PEE Oeig Oa gece (2) 


F(x;0,a) =1 5 
+a 

Suppose X is a non-negative random variable with probability density function /(x).Let w(x) be the non- 
negative weight function, then, the probability density function of the weighted random variable X,, is given 


by, p(x) =$ © L9 
E(w(x) 
where w(x) be a non negative weight function and E(w(x)) = J w(x) fxd < ©. 

We should note that the different choices of the weight function w(x) give different weighted distributions. 
When w(x) = x‘, the result is known as weighted distributions and when w(x) = x, the result is known 
as length biased distribution. In this paper, we have to obtain the length biased version of the new quasi 
Lindley distribution. The length biased weighted new quasi Lindley distribution is obtained by taking c =1 
in the weights x° to the distribution in order to obtain the length biased weighted distribution. Therefore, the 
probability density function of length biased weighted new quasi Lindley distribution is given by, 

xf (x30, a) " 


f,(%30,@) = EG) x>0 (3) 


wee 0. 


where, 


E(x)= | af (30, ade (4) 
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Figure 1: Pdf plot of length biased weighted new quasi Lindley distribution. 


Figure 2: Cdf plot of length biased weighted new quasi Lindley distribution. 


On substituting equations (1) and (4) in equation (3), we obtain the probability density function of the 
length biased weighted new quasi Lindley distribution, 
x0? 
(@? +2a) 
and the cumulative distribution function of the length biased weighted new quasi Lindley distribution is 
given by, 


Aweewo= (Ot+axje™ (5) 


F, (x;0,a) = Jp f, (3 0, ad 
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x 3 
F,(x:0,a) = | — (9+ ax)e* dx 
0 (0 +2a@) 
a ig 0 
Fls0.a)= eI, x(0+ax)e “dx (6) 


After the simplification of equation (6), we obtain the cumulative distribution function of the length 
biased weighted new quasi Lindley distribution, 


F(x;0,a@) = Pato’ (0° y(2, 0x) + ay(3, Ox)) (7) 
a 


(9? +2 


3. Reliability analysis 


In this sub section, we obtain the Reliability , hazard and Reverse hazard rate functions for the proposed 
length biased weighted new quasi Lindley distribution. 

The reliability function or the survival function of the length biased weighted new quasi Lindley 
distribution is given by, 


R(x) =1-F)(x;0,a@) 


R(x) =1- (07 7(2, Ox) + ay(3, Ox)) 


(0? +2a@) 
The hazard function is also known as hazard rate or instantaneous failure rate or force of mortality and 
is given by, 


fi (x 3 0, a) 
h(x) = ————_ 
R(x) 
x0? -Ox 
h(x) =—, 5 (O+ax)e 
(O° +2a)—-(O'y(2, 0x) + ay(3, Ax)) 


Figure 3: Reliability plot of length biased weighted new quasi Lindley distribution. 
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Figure 4: Hazard plot of length biased weighted new quasi Lindley distribution. 


The reverse hazard function of the length biased weighted new quasi Lindley distribution is given by, 
_ I 1 (x 5 0, a) 


h, (x) ~ 
Fy (x;0,a) 


x0° 


=, (O+ax)je* 
(O° y(2, Ox) + ay(3, Ox)) 


h(x) 


4. Statistical properties 

In this section we shall discuss the structural properties of length biased weighted new quasi Lindley 
distribution, especially its moments, harmonic mean, moment generating and characteristic functions. 

4.1 Moments 


Let X denote the random variable of the length biased weighted new quasi Lindley distribution with parameters 
@ and a, then the r" order moment E(X") of the distribution is obtained as, 


E(X’)= uf x" f(x; 0, a)dx 
0 
“ 3 
= Je iy - axje dx 
(0 +2a) 


3 io} 
- | x" (0+ axe" dx 
(0° +2a) 


0 
oe foe) oO 
Xx 2 a De a a eee 
= ea lo *de+al x lo Ox by 
0 


0 
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After simplification, we obtain, 


OT (r+2)+al(r +3) 
E(X')=u!= 8 
a 6’ (0? +2a) (8) 
Substituting, 7 = 1, 2, 3, 4 in equation (8), we obtain the first four moments of the distribution as, 
20° +6a 
E(X)='= 
CO= M564 2a) 
607 + 24a 
E(X’) = w= 
(OD = t= 4 2a) 
246° +120a 
E(X?) = w= 
(0) = 15’ GF 20) 
1200? + 720a 
EX") = 2'= 
(OD HOG 4 2a) 


2 2 2 
ee 60° + 24a 20° +6a ) 


(0° +2a) | O(0? +2a) 
60°+24a (20? + 6a)’ 
0 (0? +2a) 0°(0°+2a) 


Standard deviation o = ' 


60°+24a = (20 +6a)’ IG" +20) 


F — o 
Coefficient of variation = = os a 5 5 
My O° (0°+2a) O(0° +2a) (20° +6a) 


4.2 Harmonic mean 


The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals. The harmonic mean for the 
proposed length biased weighted new quasi Lindley distribution is given by, 


HM -«(+) = [rAicso.ayar 


x 
0 


Ox 


r x0° _ 
= [oR Orane 
, (0° +2a@) 
3 Ps) ioe) 
— aoa ehe"4 + cw 
(0° +2a@) , , 


3 i> 8) co 
= @ 42a) Dan) foferen + ejecta 
“+20 


0 0 


After simplification, we obtain, 


xO” 
H.M= @ 12a) Ax) + ay(2,Ox)) 
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4.3 Moment generating function 


The moment generating function is the expected function of a continuous random variable and the moment 
generating function for the length biased weighted new quasi Lindley distribution is given by, 


M,()=E(e")= | ef (x:0, ad 


= [ftom (ay ~|plsidas 
iar L030, ade 


wo, 
m5 
eS OT (j +2) +al(j+3) 
=e 6! (0" +2a@) 
1 a th 7 
=M,0=% Ty Bah Oa fi (Oj +2)+al(j+3)) 


4.4 Characteristic function 


The characteristic function is defined as the function of any real valued random variable and completely 
defines the probability distribution of a random variable and the characteristics function exists always even 
if the moment generating function does not exist. The characteristic function of the length biased weighted 
new quasi Lindley distribution is given by, 


9x (t) = M y (it) 


| Sf 
=> M,(it) EO) ae a (@°T(j+2)+al(j+3)) 


5. Order statistics 


Let X/, Xi, ,. Xj) denote the order statistics of a random sample X7, X,,...,X,, drawn from a continuous 
population with cumulative distribution function F(x) and probability density function fx(x), then the 
probability density function of the rth order — X;, 18 given by, 


XL JX TT I ht Fy x))~ ‘d= Fy x 9 
for r= 1) 283 n 


Using equations (5) and (7) in equation (9), we obtain the probability density function of the rth order 
statistics of the length biased weighted new quasi Lindley distribution which is given by, 


n! xO Pe 
Fan) = (Dk sre — axe "lots aa” 7(2,0x) + ay(3, oxy) 


1 ; as 
* Gas” v(2, 0x) + ay(3, oxy) 
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Therefore, the probability density function of higher order statistics X;,, of the length biased weighted 
new quasi Lindley distribution is given by, 


nxO° 
Soin) (x)= (0 +2a) 


and the probability density function of the first order statistics X/,, of the length biased weighted new quasi 
Lindley distribution is given by, 


—Ox 1 2 i 
(0+ax)e latins +20) (O° y(2,0x)+ay(3, ax) | 


nx? 


f= 7a) 


(0+. axe (677(2,0x) + ay(3, oxy) 


—Ox |- 
(0° +2a) 
6. Likelihood ratio test 


Let X), X, .......X,, be a random sample of size n from the new quasi Lindley distribution or length biased 
weighted new quasi Lindley distribution. We test the hypothesis, 


Ho: f(x) = f(x;0,a) — against Hy : f(x) = fi(x30,a) 


Thus for testing the hypothesis, whether the random sample of size n comes from new quasi Lindley 
distribution or length biased weighted new quasi Lindley distribution, the following test statistic is used. 


Ly n tS] (x;0,a) 
Lo i=l f(x:0,a) 


x/0(07 +a) 


= |] 5) 
Lg t=l| (6° +2a) 


2 n 
Ly {00° +a)) 2 
i= ~ 9) IT x; 
Lo \ (0° +2a)) iI 


We should reject the null hypothesis if, 


n 


0(07 +a)) 2 
A= ">. II x; >k 
(0° +2a)) i=l 
Equivalently, we reject the null hypothesis if, 
2 n 
* Nn O° +2 
i=l 00" +a) 
) n 
* Nn * * (0° +2a) 
A =[]x,;>k ,Where k =k) —~— 


it? 0(07 +a) 


Thus for a large sample of size n, 2/og A is distributed as a Chi-square distribution with one degree of 
freedom and the p- value is also obtained from it. Also, we reject the null hypothesis, when the probability 
value is given by, , , 

p(A* > £*), Where B* = [] x; is less than a specified level of significance and |] x; is the observed value of 
the statistic A*. i=l i=] 


296 G Families of Probability Distributions: Theory and Practices 


7. Bonferroni and Lorenz curves 


The Bonferroni and Lorenz curves are used not only in economics to study the distribution of income or 
wealth or income or poverty, but it is also being used in other fields like reliability, medicine, insurance and 
demography. The Bonferroni and Lorenz curves are given by, 


q 
B(p)=—— [ iese,aydr 
PL, i 


and Lp) = pB(p) =f xf(os0,a)c 
u 


1 


(207 + 6a) 


5 and g= Fy) 
O(O~ +2a) 


Where My'= E(X)= 


OP +2a) fp 


5 - x (0+ ax)e dx 
p(20° +6a) , (O° +2a) 


B(p) 


Or +2a) oO 


BP) =~ 56? + 6a) 42a) 


q 
[* (O+axje dx 
0 
q q 
| xe dx + a| x te dx 


0 


6" 
B(p) = —————_—— 
(P) p(20° + 6a) 
After simplification, we obtain, 
4 


B(p) (67, 0q) + ay(4, Aq) 


~ (20? + 6a) 
4 


and B(p) = (Ay (3,0q) + ay(4, Aq)) 


p(20° + 6a) 


8. Entropies 


The concept of entropies is important in different areas such as probability and statistics, physics, 
communication theory and economics. Entropy is also called the degree of randomness or disorder in a 
system. Entropies quantify the diversity, uncertainty, or randomness of a system. Entropy of a random 
variable X is a measure of variation of the uncertainty. 


8.1 Renyi entropy 


The Renyi entropy is important in ecology and statistics as an index of diversity. The entropy is named after 
Alfred Renyi. The Renyi entropy is important in quantum information, where it can be used as a measure of 
entanglement. For a given probability distribution, Renyi entropy is given by, 


e(P) =~ los{ | f's0,2)ar} 


1 
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where, 8 > Oand £ 41 


oa) 3 B 
1 xO Ox 
e(B) abel aa (Oieme ES 


Bo 
= B -Opx 2B 
e(P) = 5am) Js e”"(@+ax)e “| (10) 


Using the Binomial expansion in (10), we obtain, 


Pal oe) 


B 
e O° B Pax) | x8e P'dx 
(B) = 5 oes) > (6 ( | “| 


B ow oo) 
eee g° BY op-i i { BritHly-Ops 
e(f) 5 onl(e “| >") a iE e “| 


B ie} 
Z o BY g0-igi FB+ +1) 
Ore z t lcae) x7) ° Sie 


A generalization of Boltzmann-Gibbs (B.G) statistical properties initiated by Tsallis has focused a great 
deal on attention. This generalization of B-G statistics was proposed first by introducing the mathematical 
expression of Tsallis entropy (Tsallis, 1988) for a continuous random variable , defined as follows, 


8.2 Tsallis entropy 


a 


- 3 
oa | a (orase™ | dx 
; a 


A-1 


:, “{-( a ) 
A-1| | (6? +2a) 


Using the Binomial expansion in equation (11), we get, 


3 A fg 
ss 1 1 a >» Qi (axy fs J 920% oy. 
A-1 (O° +2a@) = J 
3 A ow P| i) 
s _ 1 l = > 6-iqi fe att itD-l g-A8x gy 
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9. Maximum likelihood estimation and Fisher’s information matrix 


In this section, we will discuss the maximum likelihood estimation for estimating the parameters of 
length biased weighted new quasi Lindley distribution and also its Fisher’s information matrix. Let 
X), X,...., X, be a random sample of size n from the length biased weighted new quasi Lindley distribution, 
then the likelihood function of the length biased weighted new quasi Lindley distribution is given by, 


L(x30,a@) = TI fy (38.02) 
i= 


L(x;0,a:) = lee sane 
L(x; 0, @) = "TT (x,(6+ ax,)e™ ) 
(@+2ay' 4 dS" i 
The log likelihood function is given by, 
log L(x 0, a) = 3nlog@ — nlog(67 + 2a) + Y log x, + ¥ log(o + ax;) - 0%x; (12) 


Differentiating the log likelihood equation (12) with respect to 6 and a and equating it to zero, we obtain 


the normal equations, 
dlogL 3 20 n 1 n 
al eer 5) + | ———_|- 2 x; =0 
00 0 (0° +2a)) =| (@+ax;)}) il 


n 


@log L 2 Ai 
=e =-n 5) i 1 =0 
0a (0° +2a)} (0+ax;) 


Because of the complicated form of the likelihood equations, algebraically it is very difficult to solve 
the system of non-linear equations. Therefore we use R and wolfram mathematics for estimating the required 
parameters of the proposed distribution. . . 

To obtain the confidence interval we use the asymptotic normality results. If 2 = (@,@) denotes the 
MLE of / = (@,a) we can state the results as follows: 


Vn(i-2) > N50,17'(A) 


where /(2) is the Fisher’s Information matrix .i.e., 


a7 log L 87 log L 
002 00 0a 
iijs=— ) 
ue 0” logL 0 logL 
dade daz 


where, 


a” log L 3n (200% +2a)-407 |) 2 1 
ss ne ee oe 2 5) —2 7) 
00 0 (07 +2a) =I (0 + ax;) 
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- E(x,”) 


O° logL 4 n 
E : x 


Ba? (07 +2a)?) ill (0+ ax;)* 


a” logL a” log, 40 n{  E(x;) 
Fee Ve ae a ee ee, 
00 0a 0a 00 (0° +2a) i=l (0 + ax;) 
Since A being unknown, we estimate I"! (A) by I"! (A) and this can be used to obtain asymptotic confidence 
intervals for 6 and a. 


10. Applications 


In this section, here we analyse and evaluate two real life data sets for fitting length biased weighted new 
quasi Lindley distribution and the model has been compared with the new quasi Lindley, quasi Lindley, 
Lindley and exponential distributions. In order to show that the length biased weighted new quasi Lindley 
distribution is better than the new quasi Lindley, quasi Lindley, Lindley and exponential distributions, the 
results obtained from the two real life data sets are used. The two real life data sets are given below as: 


Data set 1: The first data set denotes the time to failure of turbocharger (/03h) of one type of engine studied 
by Xu et al. (2003). The first data set is given as follows: 


1.6, 8.4, 8.1, 7.9, 3.5, 2, 8.4, 8.3, 4.8, 3.9, 2.6, 8.5, 5.4, 5, 4.5, 3, 6.0, 5.6, 5.1, 4.6, 6.5, 6.1, 5.8, 5.3, 7, 6.5, 6.3, 
6, 7.3, 7.1, 6.7, 8.7, 7.7, 7.3, 7.3, 8.8, 8, 7.8, 7.7, 9 

Data set 2: The second data set represents 40 patients suffering from blood cancer (leukemia) from one of the 
ministry of Health Hospitals in Saudi Arabia (see Abouammah et al. 2000).The ordered lifetimes (in years) 
are given as follows: 

0.315, 0.496, 0.616, 1.145, 1.208, 1.263, 1.414, 2.025, 2.036, 2.162, 2.211, 2.37, 2.532, 2.693, 2.805, 2.91, 
2.912, 2.192, 3.263, 3.348, 3.348, 3.427, 3.499, 3.534, 3.767, 3.751, 3.858, 3.986, 4.049, 4.244, 4.323, 4.381, 
4.392, 4.397, 4.647, 4.753, 4.929, 4.973, 5.074, 5.381 


Table 1: Shows maximum likelihood estimates, corresponding Standard errors, criterion values AIC, BIC, AICC and —2logL and 
comparison of fitted distribution . 


Data Set 1 
Distribution AICC MLE S.E —2logL AIC BIC 
a& = 3.6391 & = 4.8440 
LBWNQL : : 189.011 193.011 196.3888 193.3353 
@ = 4.7974 6 = 4.3797 
& = 3.3626 & = 3.3560 
NQL . . 201.0361 205.0361 208.4138 205.3604 
6 = 3.1980 6 = 3.5756 
& = 0.0010 & = 0.2123 
QL : . 201.0589 205.0589 208.4367 205.3832 
8 = 0.3197 6 = 0.0311 
Lindley 6 = 0.2844 6 = 0.0321 208.5708 210.5708 212.2597 210.6760 
Exponential 6 = 0.1599 6 = 0.0252 226.6385 228.6385 230.3274 228.7437 
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Table 2: Shows maximum likelihood estimates, corresponding Standard errors, criterion values AIC, BIC, AICC and -2logL and 
comparison of fitted distribution. 


Data Set 2 
Distribution AICC MLE S.E —2logL AIC BIC 
& = 8.9514 a = 38.4471 
LBWNQL _ 7 147.461 151.461 154.8388 151.7853 
6 = 0.9402 6 = 0.1044 
& =1.6173 & = 1.6778 
NQL 7 - 152.7528 156.7528 160.1306 157.0771 
8 = 6.3670 6 = 7.1183 
& = 0.0010000 & = 0.5536723 
QL : i 152.766 156.766 160.1437 157.0903 
6 = 0.6365481 6 = 0.1545506 
Lindley 6 = 0.5269 6 = 0.0607 160.5012 162.5012 164.19 | 162.6064 
Exponential 6 = 0.3183 6 = 0.0503 171.5563 173.5563 175.2452 | 173.6615 


R software is used to carry out the numerical analysis of two data sets and is also used for estimating the 
unknown parameters and model comparison criterion values. In order to compare the length biased weighted 
new quasi Lindley distribution with new quasi Lindley, quasi Lindley, Lindley and exponential distributions, 
we consider the criterion values like AIC (Akaike information criterion), AICC (corrected Akaike information 
criterion) and BIC (Bayesian information criterion). The better distribution corresponds to lesser values of 
AIC, BIC, AICC and —2logL. The formulas for calculation of AIC, AICC and BIC values are, 

2k(k +1) 
AIC = 2k -2log L AICC = AIC + eof and BlC=klogn-2logLl 
een es 
where k is the number of parameters in the statistical model, n is the sample size and -2logL is the maximized 
value of the log-likelihood function under the considered model. 

From Tables | and 2 given above, it has been observed that the length biased weighted new quasi Lindley 
distribution have the lower AIC, AICC, BIC and —2logL values as compared to the new quasi Lindley, quasi 
Lindley, Lindley and exponential distributions. Hence, we conclude that the length biased weighted new quasi 
Lindley distribution leads to a better fit than the new quasi Lindley, quasi Lindley, Lindley and exponential 
distributions. 


11. Conclusion 


In the present study, we have introduced the length biased weighted new quasi Lindley distribution as a new 
generalization of the new quasi Lindley distribution. The subject distribution is generated by using the length 
biased technique and taking the two parameter new quasi Lindley distribution as the base distribution. The 
different mathematical and statistical properties of the newly executed distribution along with the reliability 
measures are discussed. The parameters of the proposed distribution are obtained by using the methods of 
maximum likelihood estimator and the Fisher’s information matrix which have been discussed. Finally, the 
application of the new distribution has also been illustrated by demonstrating with two real life data sets. The 
results of the two data sets are used by comparing the length biased weighted new quasi Lindley distribution 
to the new quasi Lindley, quasi Lindley, Lindley and exponential distributions and the results indicate that the 
length biased weighted new quasi Lindley distribution provides a better fit than the new quasi Lindley, quasi 
Lindley, Lindley and exponential distributions. 
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A New Alpha Power Transformed 
Weibull Distribution 


Properties and Applications 


Ashok Kumar Pathak,' Mohd Arshad,** Sanjeev Bakshi, 
Mukti Khetan* and Sherry Mangla* 


1. Introduction 


Exponential and Rayleigh distributions are two important commonly used lifetime distributions in modeling 
lifetime data with broad applications in reliability and survival analysis. Due to constant and increasing 
failure rate functions, these distributions have limited applications in a wide class of real data where 
complexity arises in it. Applicability of statistical distributions in modelling real-world phenomena attracts 
statisticians to construct new flexible families of distributions with applications in the diverse areas of the 
applied sciences like reliability, engineering, medicine, energy, finance and insurance. The Linear exponential 
(LE) distribution includes exponential and Rayleigh distributions as two important sub-models for modelling 
lifetime data that encounter linearly increasing failure rates. However, the LE distribution does not provide 
a good fit to the data that arises in reliability analysis and biological studies, where hazard rates decrease, 
non-linearity increases, and bathtub shape behavior is observed. Several generalizations of the exponential, 
Rayleigh, and LE distributions have been studied in the recent past. For some useful references, one may 
refer to Gupta and Kundu (1999), Mahmoud and Alam (2010), Sarhan et al. (2013), Tian et al. (2014), Khan 
et al. (2017), Pathak et al. (2021). 

The Weibull distribution is one of the most popular lifetime distributions with diverse applications in 
different disciplines (see Mudholkar and Srivastava (1993), Lai (2014)). We say that a random variable X has 
a Weibull distribution with parameters B and y if its cumulative distribution function (CDF) and probability 
density function (PDF) are given by, 


Gy(x)=1-e*"x>0 
and 
Zy(x) = fax*! oP x>0, 
where f > 0 and 12> 0. 
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The Weibull distribution is a natural extension of the exponential and Rayleigh distributions. The 
hazard rate function of the Weibull distribution can be increasing, decreasing, or constant depending on its 
choice of parameters and may be used as an alternative to the exponential and Rayleigh models in lifetime 
data modelling. But this model may not be useful in real data with bathtub hazard rate functions. Several 
generalizations have been proposed by adopting some new state-of-the-art techniques to overcome the 
arising issues with the Weibull distribution. Some important methodologies involve linear, log and inverse 
transformations, enhancing the number of shape and location parameters, and power transformation of the 
Weibull distribution. Apart from these approaches, some new families of the generalized Weibull distribution 
have also been proposed by mixing two or more Weibull distributions to enhance the applicability of the model 
in dealing with lifetime data. Mudholkar and Srivastava (1993) proposed a three-parameter exponentiated 
Weibull distribution which includes generalized exponential and Weibull distributions as sub-models. This 
model can accommodate bathtub-shaped, unimodal, monotonically increasing, and monotonically decreasing 
hazard rates. A three-parameter extended Weibull distribution has also been studied by Marshall and Olkin 
(1997). Xie et al. (2002) considered a new extension of the Weibull distribution and estimated the model 
parameters using graphical techniques. Several families in these generalizations of the Weibull distributions 
include two or three parameters in the model and provide flexibility in the models. For a good source of 
literature review of these models, one may refer to Pham and Lai (2007) and Lai (2014). 

Apart from these extensions, Weibull models with four and five parameters are also studied in the literature. 
Xie and Lai (1996) introduced a four-parameter additive Weibull distribution by mixing two Weibull survival 
functions, with one having an increasing and the other a decreasing failure rate function and discussed its 
parameter estimation. This distribution is flexible in modelling lifetime data with bathtub hazard rate. A 
four-parameter beta Weibull distribution was studied by Famoye et al. (2005). Another new four-parameter 
generalization of the Weibull distribution using a power transformation is proposed by Carrasco et al. (2008). 
Sarhan and Zaindin (2009) proposed a new four-parameter modified Weibull distribution and demonstrated 
its statistical properties. Recently, some other modified Weibull distributions with five parameters have been 
proposed and studied by Silva et al. (2010), Nadarajah et al. (2011), Sarhan et al. (2013), Almalki and Yuan 
(2013), He et al. (2016), and Abd EL-Baset and Ghazal (2020). 

The main aim of this chapter is to introduce a three-parameter new alpha power transformed Weibull 
distribution and study its various important statistical properties. We denote the new alpha power transformed 
Weibull distribution by the NAPTW distribution. The NAPTW family includes a large class of well-known 
distributions and their generalizations, including exponential, Rayleigh, new alpha power transformed 
exponential (NAPTE) distributions, and more. The hazard rate function of the NAPTW distribution takes 
different shapes. It can be used in the analysis of a wide class of lifetime data. 

The organization of the chapter is as follows: In Section 2, we present a new alpha power transformed 
Weibull (NAPTW) distribution and deduce some known families of the distributions and their extensions 
from it. We present the expressions for survival function, hazard rate function, and moments for the NAPTW 
distribution. Then, we numerically tabulate several measures of descriptive statistics for the NAPTW family. 
We also calculate the distribution of order statistics for the proposed distribution. In Section 3, we discuss 
the estimation of the model parameters using maximum likelihood (ML), least squares (LS), weighted 
least squares (WLS), Anderson Darling (AD) and Cramer von Mises (CvM) methods. We also perform the 
simulation study to demonstrate the performance of the estimators. Finally, two real data sets are examined 
by the NAPTW distribution to demonstrate the applicability of the proposed model in real-life applications. 


2. Anew alpha power transformed weibull distribution 
Arandom variable X is said to follow the new alpha power transformed Weibull (NAPTW) distribution if its 
density function is given by, 

_pxt 
log(a) Pax a) 


(ef -1) 


f(xsa@, BA) = X>0,a>LB>0,1>0. (1) 


304 G Families of Probability Distributions: Theory and Practices 


The family proposed in (1) includes a large class of well-known distributions. Some important well- 
known sub-models are listed below: 
(1) For a= e, model (1) reduces to a Weibull distribution with parameters f and J. 
(11) When a = e and J = 2, (1) leads to a Rayleigh distribution with parameters /. 


(111) For a = e and A = 1, (1) reduces to an exponential distribution with parameters /. In particular, for 6 = 1, 
it reduces to the standard exponential distribution. 


(iv) For A = 1, (1) leads to a new alpha power transformed exponential distribution proposed by [jaz et al. 
(2021). 
The cumulative distribution function and survival function of the NAPTW random variable X is given by, 


F(x;a,B,A) = a8?" x > 0, a>1, B>0, andA>0 
and 
S(x;a,B,A) = P(X > x) = 1 — ales P) x > 0, a> 1,8 > 0, andA>0, 


respectively. 
The hazard rate function of the NAPTW distribution is, 


ae 
log(a) Bax* tq?) 

Bx* ing(i-e B* 5 , 
(er =a ) 


For different values of the parameters, various density and hazard rates plots are presented in Figure | 
and Figure 2. 

From Figure 2, we see that the hazard rate function takes different shapes for several sets of values of the 
model parameters. This provides the usefulness of the model in practical situations. 


h(x;a, 8,2) = (2) 


a=e, B=0.5, A=1.5 
a=e, B=0.5, A=2 


a=e, B=0.5, A=1 
a=e, B=1, A=1 


Figure 1: PDF graph of NAPTW (a,f,/) distribution. 
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a=1.5, B=0.1, A=1 

— a=1.5, B=0.5, A=1 

— a=3, B=0.8, A=2 
a=4, B=1, A=2.5 
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Figure 2: Hazard function graph of NAPTW (a,,2) distribution. 


2.1 Moments 


The evaluation of the moments is essential in order to study the various important statistical properties of the 
distribution. Several measures of tendency, dispersion, skewness, and kurtosis are expressed in terms of the 
moments. Let_X be a random variable with density function f(x). Then k” order raw moments are defined by, 


E(X*)= } i x" f(x)dx. 


For the NAPTW distribution, the k" order raw moments are given by, 


k+A-1_ loge B™ 
wo yktAql (log (l-e ) 


E(X*) =log(a) BA I, aa (3) 
oot 


The integrand in (3) is not an explicit function of x and is complex in nature. Therefore, an algebraic 
calculation of (3) is quite difficult. We evaluate equation (3) numerically for different values of k and 
parameters. With the help of these values, we present various measures of descriptive statistics like mean, 
variance, skewness (y,) and kurtosis (£,) of the NAPTW distribution in Table 1 and study its nature . 


2.2 Quanitile function, skewness and kurtosis 


The quantile function represents the inverse of the cumulative distribution function and determines the 
number of values in a distribution that are below and above certain limits. It is also a basic unit for random 
data generation from non-uniform random variables. For a random variable X with distribution function F(x), 
it is defined by, 


O(q) = F' (q) = inf {x € R:F(x) = q}, for0<q<1. 
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Table 1: Mean, variance, skewness and kurtosis of NAPTW distribution. 


a. B Xd Mean Variance V1 B, 
2.72 IS 0.5 0.89 3.95, 6.62 87.72 
2.72 IS 1 0.67 0.44 2.00 9.00 
2.72 1.5 1.5 0.69 0.22 1.07 4.39 
2.72 1.5 2 0.72 0.14 0.63 3.25 
2.72 1 2 0.89 0.21 0.63 3.25 
2.72 2 2 0.63 0.11 0.63 3.25 
2.72 3 2 0.51 0.07 0.63 3.25 
2.72 4 2 0.44 0.05 0.63 3.25 

1.5 1.5 2 1.50 0.45, 0.14 1.06 
2.3 1.5 2 2.50 0.70 0.14 0.66 
35 IS 2 3.50 0.79 0.14 0.57 
4.5 IS 2 4.50 0.82 0.14 0.55, 


The q” quantile of the distribution is obtained by solving, 


F(X) =q. 
For the NAPTW distribution, it is given by, 


O(q) =x 7 at ae a) 


where g ~ U(0,1) uniform distribution. 
Three quantiles Q, = Q (0.25), QO, = O(0.5), and Q3 = Q(0.75), are very useful in describing the summary 
of data. O, = Q(0.5) corresponds to the median of the distribution. Interquartile range is given by Q; — Q). 
With the help of (4), we can calculate Bowley’s skewness ‘¥ and Moors kurtosis (see Arshad et al. 
(2021)) by, 


— Q(0.750) + Q(0.250) — 20(0.50) 
Q(0.750) — O(0.250) 


and 


@ = LL0875) + (0.375) - (0.625) — 90.125) 
Q(0.750) — O(0.250) 


2.3 Order statistics 


In this subsection, we calculate the distributions and density of various order statistics from the NAPTW 
distribution. Let X), X,,..., X,, be a random sample from a population with CDF (F\(x)) and PDF (f(x). IfX(1) 
EX oy S++ SA G5 denotes the order statistics of X,, X,,..., X,, then distribution and density function of the s” 
order statistics Xo (s = 1,2,..., 7) are given by, 


Fe) is » (le (x)]’ [l-Fy war? 
and 


"Te (x) F(x)? f2. 


Fa) = G Dia 
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For the NAPTW variates, the distributions of order statistics are given by, 


n n . —px* —px* : 
- log(1- ) log(1-e )qn- 
Fx (x)= ) Je log(I-e [l-a° ia 


and 
A-1 jlo (1c BX") lo (1-e- BX") n-j 
n!log(a)BA x" a! °° fl-a°? lias 


G-)\m-j)! (e®** —1) 


fx.) (x) = 


3. Parameter estimation and numerical experiments 


In this section, we discuss the parameter estimation of the model using various methods and perform two 
sets of numerical experiments. The inverse probability integral transform has been utilized to simulate sets 
of 5000 (NV) samples of predetermined sizes 200, 500, 800, 1000 and 1200. These samples were simulated 
for the selected combination of parameters a = 2.4, 6 = 4.0, 4 = 2.0 (Table 2) and a = 3.2, B= 1.5,1= 2.6 
(Table 3). The estimates for parameters of NAPTW, namely, maximum likelihood (ML), least squares (LS), 
weighted least squares (WLS), Anderson Darling (AD) and Cramer von Mises (CvM) for each of these 
samples are obtained. Determining ML estimates requires the maximization of the likelihood function of the 
NAPTW distribution. Let X), X,..., X,,be a random sample from the NAPTW distribution. The likelihood 
function of the NAPTW distribution is given by, 


A 
log(i—-e P% ) 


LEO) = Ts (us & B. 2) = TT log(a) B Ax}! ——>—. 
(eP" -1) 


After taking logs on both the sides, the log-likelihood function of the NAPTW distribution is given by, 
(8) = n log {AB log(a)} + (2-1) De log x; + Dt log (1 — e#*') log(a) Sr jlog(e* — 1) 
Here 6 = (a, f, A). 
Table 2: Estimated average bias (AB) and root mean square error (RMSE) for maximum likelihood (ML) estimator, least squares 


(LS) estimator, weighted least squares (WLS) estimator, Anderson Darling (AD) estimator and Cramer-von Mises (CvM) estimator for 
estimation of parameters of the NAPTW distribution with parameters a = 2.4, 6 = 4.0, 2 = 2.0. 


WLS Estimates AD Estimates CvM Estimates 
AB RMSE AB RMSE AB RMSE 


LS Estimates 


Parameter Sample ML Estimates 
size 


| 
200 59.31 | 0.56 2.83 | 066 | 8.76 | 3.61 53.42 
500 1.48 | 0.12 0.66 0.13 | 0.65 | 0.28 1.51 
800 0.69 | 0.07 045 | 0.07 | 045 | 0.14 0.70 
1000 0.55 | 0.05 038 | 0.06 | 038 | 0.10 0.55 
1200 0.47 | 0.04 034 | 0.04 | 034 | 0.08 0.47 
200 0.67 | 0.10 0.50 | 0.10 | 047 | 0.22 0.70 
500 0.32 | 0.03 0.27 0.04 | 0.27 | 0.08 0.33 


0.05 


0.25 


0.21 0.02 0.21 


800 0.02 
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Table 3: Estimated average bias (AB) and root mean square error (RMSE) for maximum likelihood (ML) estimator, least squares (LS) 
estimator, weighted least squares (WLS) estimator, Anderson Darling (AD) estimator and Cramer - von Mises (CvM) estimator for 
estimation of parameters of NAPTW distribution with parameters a = 3.2, 8 = 1.5,1 = 2.6. 


Parameter | Sample ML Estimates LS Estimates WLS Estimates AD Estimates | cv Estimates 
size AB | RMSE | AB RMSE | AB | RMSE 

0.93 | 5.08 334.31 | 34.08 | 644.26 

0.20 | 1.06 151 | 1.00 | 11.95 

a ou | 0.72 0.93 | 0.37 1.81 
0.09 | 0.63 0.78 | 0.27 1.27 

0.07 | 0.55 0.67 | 0.21 1.00 


0.02 0.36 


0.52 


The LS, WLS, AD and CvM estimates are obtained by minimizing, with respect to 0, the functions 
S(O), WA), A(O) an C(A) respectively. These functions are defined as follows: 


2 


n 


i=l 


— as _ 1 _ (ntl) (n+2) 
We)= > F{ Flio) aii nee, Var( F(x) i(n—i+l) 


A(é)=—n -=" [ei =1){In(F(x,))) + in(1-F (xu-»))}| 


1 : 2r 
c@)= FD fF 4n)-[ > } : 


where F(-) is the CDF of NAPTW and x(1) < xX) < ++: Sx(,) denote ordered observations for a given sample. 
The functions ML, LS, WLS, AD and CvM are optimized using the optim function in R (R Core Team 2017). 
For data simulation, one may refer to the paper by Arshad et al. (2021). 

Average bias (AB) and root mean square error (RMSE) are estimated for all selected sample sizes for 
each of the estimators discussed above. Estimates of AB and RMSE of the MLE 4 of a parameter 6 are 
given as: 


Id. 
AB =— 0-0 
yo? | 


_ LY (_@y 
RMSE = di -9) 
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The change in AB and RMSE for an increasing sample size is investigated. For a given sample size, 
the AB and RMSE of the estimators are compared. Tables 2 and 3 provide the estimated AB and RMSE 
for estimators of a, 6 and 4. From the results, it can be verified that the AB and the RMSE decrease with an 
increase in the sample size. Further, the AB and RMSE for MLE are found to be lower than that of LS, WLS, 
AD and CvM estimators. Hence, MLE is a better choice for the estimation of the parameters of the NAPTW 
distribution. 


3.1 Real data applications 


The NAPTW distribution admits the exponential, the Weibull and the novel alpha power transformed 
exponential (NAPTE) as some of its sub-models. The fit of the NAPTW is compared with the fit of these 
selected sub-models for two data sets. These data sets are the bladder cancer patients’ (BCP) data and the 
bank customers’ (BC) data. The model fit for each selected data set is assessed using different goodness of 
fit statistics pertaining to popular goodness of fit tests. The goodness of fit statistics utilized for this purpose 
are -2In (L,,), the Akaike’s Information Criterion (AIC), the Bayesian Information Criterion (BIC), AIC 
corrected for small samples (AIC,), the Consistent Akaike’s Information Criterion (CAIC) and the Hannan- 
Quinn Information Criterion (HQIC). Here, L,,denotes the maximized value of the likelihood function. 
These measures utilize the maximized value of the likelihood function. Other goodness of fit statistics are W’, 
A’ (Chen and Balakrishnan (1995)) and the sum of squares (SS). The smaller the values of these statistics, 
the better is the fit of the model. In addition, the statistics, namely, Kolmogorov-Smirnov (KS), Cramer von 
Mises (CvM) and Anderson-Darling (AD), along with their p-values, have also been utilized for the purpose. 
Let Ly, p and n denote the maximized value of the likelihood function, the number of estimated parameters 
and the size of the sample, respectively. The statistics discussed above are defined as follows: 


1. AIC =-2 In(Ly,) + 2p 
_ 2p(p+)) 
2. AIC, = -2 In(Ly) + 2p + 
m-p-l 
3. BIC = —2 In(Ly) + p Inn) 
4. CAIC =-2 In(Ly) + p {In(n) + 1} 
5. HOIC =-2 In(Ly) + 2p {In(in(n))} 
6. W=W? (1+) 
n 
7. A*=A? [is vie + aS F 
n n- 


The steps involved in the computation of W? and A? are explained in what follows. Consider the sample 
(x), Xp,..., X,) where x1, X7,..., x, are arranged in ascending order. Let F(x; 0) be the CDF of the population 
from which the sample is drawn. Further, let 9 be the MLE of 6 based on the given sample. We define the 
following statistics: 


uj= F(x; 8) 


x)=" (u) 
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where ® denotes the CDF of a standard normal variable and ®~' denotes its inverse. Utilizing the above 
statistics, W* and A? are defined as follows: 


peo Ngee), ol 1 . 
12n mt" \ wn jf? 


A= =>» {(2i-1) n(v,) + (2n+1-21) In(1-v,)}. 
n i=l 


Besides these measures, LR tests are conducted where the sub models are tested against the full model, 
i.e., NAPTW for each selected data set. 
3.1.1 Bladder cancer patients (BCP) data 


BCP data consists of a sample of 128 remission times (in months) of bladder cancer patients (Aldeni et al. 
(2017)). The sample remission times are below: 


0.08 0.20 0.40 0.50 0.51 0.81 0.90 1.05 1.19 1.26 
1.35 1.40 2.02 2.07 2.09 2.23 2.26 2.46 2.54 2.62 
2.64 2.69 2.75 2.83 3.31 3.36 3.36 3.48 3.52 3.57 
3.64 3.70 3.82 3.88 4.18 4.23 4.40 4.50 4.51 4.87 
4.98 5.06 5.09 5.17 5.32 5.32 5.34 5.41 5.71 5.85 
6.25 6.54 6.76 6.93 6.94 6.97 7.09 7.26 7.28 7.32 
7.63 7.66 7.87 7.93 8.26 8.37 8.53 8.65 8.66 9.02 


9.22 9.47 10.66 10.75 11.25 11.64 11.79 11.98 12.02 12.03 
12.07 12.63 13.11 13.29 14.77 14.83 15.96 16.62 17.12 17.14 
17.36 18.10 19.13 20.28 21.73 22.69 26.31 32.15 34.26 36.66 
43.01 46.12 79.05 1.46 1.76 2.02 5.41 5.49 2.87 3.02 
3.25 7.39 7.59 5.62 13.80 14.24 14.76 10.34 25.74 25.82 
4.26 4.33 4.34 9.74 10.06 7.62 23.63 2.69 


The mean remission time is 9.37 months (Table 4). It has also been used by jaz et al. (2021) in their 
study on the NAPTE distribution. The observations range from 0.80 months to 79.05 months, with a median 
remission time of 6.40 months. The first and the third quartiles are found to be 3.35 months and 11.84 
months, respectively (Table 4). Based on the BCP data, the maximum likelihood estimates of NAPTW and 
its sub-models are given in Table 5. The values of the selected goodness of fit statistics, pertaining to the 
fitting of the NAPTW model, namely, —2In (Ly), AIC, BIC, AIC,, CAIC, HQIC, W’, A* and SS are found to 
be 821.36, 825.36, 831.06, 825.46, 833.06, 827.68, 0.044, 0.288 and 0.038 respectively. Each of these values 
is lower than that of corresponding values for the selected NAPTW sub-models. Hence, NAPTW is a better 
model for BCP data when compared to the NAPTE, the Weibull or the exponential models. The LR tests for 
hypotheses H,: exponential against H,: NAPTW, H,: Weibull against H,: NAPTW and H,: NAPTE against 
H,: NAPTW resulted in statistics with p-values 0.03, 0.01 and 0.03, respectively, indicating NAPTW to be a 
preferred choice for modeling of the BCP data. The KS, AD and CvM statistics (Table 6) also substantiate the 
finding that NAPTW provides a better fit to the BCP data when compared to its sub-models. 
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Table 4: Descriptive statistics for bladder cancer patients’ data and bank customers’ data. 


Data | Sample size | minimum | maximum | 1“ quartile | 3°‘ quartile | median mean 
BCP 128 0.80 79.05 3.35 11.84 6.40 9.37 
BC 100 0.80 38.50 4.68 13.03 8.10 9.89 


Table 5: Maximum likelihood estimates for NAPTW and its selected sub-models and the model fit under various goodness of fit criteria 
for bladder cancer patients’ data and bank customers’ data. 


Data | Model a B a —2log | AIC BIC AIC, CAIC HQIC w AY SS 
Cmax ) 
BCP | Exponential | - 0.1068 | - 828.68 | 830.68 | 833.54 | 830.72 | 834.54 | 831.84 | 0.119 | 0.716 | 0.174 
Data 
Weibull - 0.0939 | 1.0478 | 828.17 | 832.17 | 837.88 | 832.27 | 839.88 | 834.49 | 0.131 | 0.786 | 0.150 
NAPTE 3.3804 0.1212 | - 826.16 | 830.16 | 835.86 | 830.25 | 837.86 | 832.47 | 0.112 | 0.674 | 0.125 


NAPTW 16.3820 0.4537 | 0.6544 | 821.36 | 825.36 | 831.06 | 825.46 | 833.06 | 827.68 | 0.044 | 0.288 | 0.038 


BC Exponential | - 0.1012 | - 658.04 | 660.04 | 662.65 | 660.08 | 663.65 | 661.10 | 0.027 | 0.179 | 0.076 
Data 
Weibull - 0.0306 | 1.4573 | 637.46 | 641.46 | 646.67 | 641.59 | 648.67 | 643.57 | 0.063 | 0.396 | 0.058 
NAPTE 8.8766 0.1592 | - 634.19 | 638.19 | 643.40 | 638.31 | 645.40 | 640.30 | 0.021 | 0.143 | 0.021 


NAPTW 14.5255 0.2229 | 0.9054 | 634.07 | 638.07 | 643.28 | 638.19 | 645.28 | 640.18 | 0.017 | 0.127 | 0.017 


3.1.2 Bank customers (BC) data 


BC data is a sample of waiting times (in minutes) of 100 bank customers (Ghitany et al. 2008). The sample 
waiting times are below: 


0.8 0.8 1.3 1.5 1.8 1.9 1.9 2.1 2.6 2.7 
2.9 3.1 3.2 3.3 3.5 3.6 4.0 4.1 4.2 4.2 
4.3 4.3 4.4 4.4 4.6 4.7 4.7 4.8 4.9 4.9 
5.0 5.3 5.5 BM a) 6.1 6.2 6.2 6.2 6.3 
6.7 6.9 7.1 7.1 7.1 7.1 74 7.6 7.7 8.0 
8.2 8.6 8.6 8.6 8.8 8.8 8.9 8.9 9.5 9.6 
9.7 9.8 10.7 10.9 11.0 11.0 11.1 11.2 11.2 11.5 
11.9 12.4 12.5 12.9 13.0 13.1 1333 13.6 13.7 13.9 
14.1 15.4 15.4 17.3 17.3 18.1 18.2 18.4 18.9 19.0 
19.9 20.6 21.3 21.4 21.9 23.0 27.0 31.6 33.1 38.5 


The sample mean waiting time is found to be 9.89 minutes. Waiting times are found to range between 
0.80 minutes to 38.50 minutes, with a median waiting time of 8.10 minutes. The first and the third quartiles 
are found to be 4.68 minutes and 13.03 minutes (Table 4), respectively. For the BC data the values for -2In 
(Ly), AIC, BIC, AIC,, CAIC, HQIC, W’*, A’ and SS are found to be 634.07, 638.07, 643.28, 638.19, 645.28, 
640.18, 0.017, 0.127 and 0.017 respectively (Table 5). These are lower than the corresponding values for 
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Table 6: Kolmogorov-Smirnov (KS), Anderson Darling (AD) and Cramer von Mises goodness of fit test statistics and corresponding 
p-values for NAPTW and its selected sub-models for fitting the bladder cancer patients’ data and bank customers’ data. 


Data Model KS AD CvM 

BCP Exponential | 0.0846 (0.3183) 1.1736 (0.2777) | 0.1788 (0.3129) 
Weibull 0.0700 (0.5570) | 0.9579 (0.3799) | 0.1537 (0.3788) 
NAPTE 0.0725 (0.5115) | 0.7138 (0.5472) | 0.1279 (0.4652) 
NAPTW 0.0450 (0.9576) | 0.2704 (0.9586) | 0.0403 (0.9321) 

BC Exponential | 0.1730 (0.0050) | 4.2293 (0.0068) | 0.7154 (0.0115) 
Weibull 0.0576 (0.8947) | 0.4049 (0.8436) | 0.0607 (0.8108) 
NAPTE 0.0403 (0.9969) | 0.1457 (0.9990) | 0.0214 (0.9957) 
NAPTW 0.0365 (0.9994) | 0.1279 (0.9996) | 0.0176 (0.9988) 


0.12- 
0.09 - 
Colour 
—— Exponential 
0.06 - — NAPTE 
— NAPTW 
— Weibull 
0.03 - 
0.00 - 


Remission time 


Figure 3: Histogram of relative frequencies and fitted PDFs for bladder cancer patients’ data. 


selected NAPTW sub-models . Hence NAPTW is found to provide a better fit for BC data. In their study 
on the NAPTE model, Ijaz et al. (2021) have shown that the NAPTE model fits better to the BC data when 
compared to the exponential, the Rayleigh, the Weibull and the Weibull Exponential models. The present 
study found the NAPTW to be a better choice to model BC data when compared to NAPTE or the other 
selected sub-models of NAPTW. The LR tests for hypotheses Hy: exponential against H,: NAPTW, Ho: 
Weibull against H,: NAPTW and Hy: NAPTE against H,: NAPTW resulted in statistics with p-values 0.01, 
0.07 and 0.72, respectively. It implies that NAPTW may be preferred over the exponential or Weibull models 
for modeling the BC data. However, NAPTE may be considered a good model for the BC data, as is the 
NAPTW model. Based on the values obtained for the KS, Ad and CvM statistics, it can be inferred that 
NAPTW provides a good fit to the BC data compared to its sub-models or is a better choice for modeling 
BC data. 
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Figure 5: Empirical and fitted CDFs for bladder cancer patients’ data. 
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Figure 6: Empirical and fitted CDFs for bank customers’ data. 


4. Discussion and conclusion 


In this chapter, a three-parameter new alpha power transformed Weibull distribution is formulated. The 
proposed distribution includes new alpha power transformed exponential, Weibull, Rayleigh, and exponential 
distributions as its important sub-models. Several important statistical properties like survival rate, hazard 
rate, quantile function, and order statistics of the proposed distribution are studied. Th estimation of model 
parameters is preformed using various important techniques like maximum likelihood, least squares, weighted 
least squares, Anderson Darling, and Cramer von Mises. Using some numerical experiments, the average bias 
and root mean square error of the estimates are also reported. Finally, two real data set are also considered and 
fitted using the NAPTW distribution. By analyzing these data sets, we conclude that the NAPTW distribution 
provides a better fit over the new alpha power transformed exponential, Weibull and exponential distributions 
and is useful in modelling a wide class of real data. 
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Chapter 20 


An Extension of Topp-Leone Distribution 
with Increasing, Decreasing and Bathtub 
Hazard Functions 


Unnati Nigam and Arun Kaushik* 


1. Introduction 


In the recent times, various lifetime probability distributions have been proposed that are defined on the 
positive real line. They have high applicability in various fields like biomedicine and social sciences among 
others. However, such models are not useful for modeling random variables with a bounded range. The most 
used application of modeling random variables with abounded range is to model proportion and percentage 
data measured on the unit interval . The Beta distribution is one of the most commonly used distributions to 
model unit interval data. Two important unit distributions, Johnson (see Johnson, 1949) and Kumaraswamy 
distribution (see Kumaraswamy, 1980) are also recommended to model unit interval data. However, these 
classical models may be inadequate and may pose problems for accurate data analysis. For this, various 
lifetime distributions have been transformed to unit-intervals. Some of these are, unit-Gamma or log-Gamma 
(Consul and Jain, 1971), unit-Weibull (see Mazucheli et al., 2018b), log-Lindley (see Gdmez-Déniz et al., 
2014), unit Gompertz (see Mazucheli et al., 2019), unit Birnbaum-Saunders (see Mazucheli et al., 2018a), 
unit Burr-XII (see Korkmaz and Chesneau, 2021) and more. 

One important distribution which is widely used for fitting unit range data is the one parameter Topp- 
Leone distribution given by, Topp and Leone, 1995. Nadarajah and Kotz, 2003, presented the closed form 
expressions of the moments of Topp-Leone (TL) distribution. The Topp-Leone distribution has a J-shaped 
frequency curve and has been used by various researchers for their studies. The TL distribution is a mixture 
of generalized triangular distribution and uniform distribution. 

The Topp-Leone distribution has also proved to be effective in generating new flexible families of 
distributions. Sharma, 2018, introduced the Topp-Leone normal distribution and showed its application 
to three real data sets. Shekhawat and Sharma, 2021, present a two-parameter extension of Topp-Leone 
distribution by adding a skewness parameter and showing it’s application on tissue damage proportions data. 
Sangsanit and Bodhisuwan, 2016, proposed the Topp-Leone generalized exponential distribution. Alizadeh 
et al., 2018, proposed a Topp-Leone odd log-logistic family of distributions with its application on a regression 
model. 
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In this article, we introduce a two-parameter extension of the Topp-Leone distribution using the 
transformation given below. If F(x) and f(x) be the cdf and pdf of the proposed distribution respectively, then 
F(x) is defined as, 


ah —] 
F(x)= sa>l. (1) 
a-1 
and the corresponding pdf is, 
1 G(x) 
fx) = MOBO a> 1. (2) 
= 


This transformation gives the DUS transformation if we take a = e, given by, Kumar et al., 2015. 

We consider the baseline distribution One-Parameter Topp-Leone distribution with cdf G(x) = x*(2-x) 
*0<x<1,a>0 and corresponding pdf as g(x) = 2ax*! (1 — x) (2-—x)*!; 0 <x <1; a> 0. Using the 
transformation proposed in equation (1), the cdf and pdf of the resulting distribution, hereafter referred to as 
the Power Exponentiated Topp-Leone (PETL) distribution can easily be obtained as: 


gee] 
EQ), OSes Lae tae, (3) 
a— 
al a-l(] — 2 = x) gs@e—0® 
Fa) = ay a 50<x<la>O,a>1. ©) 
a- 


The Power Exponentiated Topp-Leone (PETL) distribution is very flexible as it can accommodate a 
variety of shapes of hazard rate and density functions. We use PETL (a, a) to denote the distribution given 
in equations (3) and (4). 

In this article, we present the attractive statistical properties of the proposed PETL (a,y) distribution and 
present its effective use for modeling failure time (in days) data of air conditioning system of an airplane (as 
presented by Linhart and Zucchini, 1986) over the existing distributions defined on the unit interval. 

The shapes of density, distribution, reliability and hazard functions are explored in Section 2. The 
Statistical Properties of the distribution are obtained in Section 3. These include ordinary moments, 
conditional moments, quantile function, order statistics, mean deviation about mean and median, entropy, 
stress-strength reliability, identifiability, stochastic ordering and differential equations. We derive maximum 
likelihood estimators (MLEs) and their asymptotic confidence intervals in Section 4. A simulation study 
is carried out to study the behavior of the mean squared error and mean absolute bias of the MLEs. In 
Section 5, the proposed distribution is used to model a real dataset of maximum flood levels of Susquehenna 
River at Harrisburg, Pennsylvania over the existing unit-interval distributions like Beta, Kumaraswamy, 
unit-Gamma, Topp-Leone and Generalized Topp-Leone (given by Shekhawat and V.K. Sharma,2021) 
distributions. The findings of the paper are highlighted in Section 6. 


2. Shapes of the distribution 


Figure | shows the probability density and cumulative density plots for different values of parameters as per 
equations (3) and (4). The associated reliability function is, 
a— ae 


Be) sae aaa tl. (5) 


= 
The associated hazard rate is, 


2aln(a)\x™\(1 — x) (2 — x)" ae 


a— ae 


h(x) = 


50<x<ljsa>0;a>1. (6) 


Figure 2 shows the reliability and hazard rate functions for different values of the parameter. 
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Figure 2: Reliability and Hazard Rate Functions of PETL(a,q). 


3. Statistical properties 


3.1 Moments 


Theorem 3.1 
a d ; ns +ai-l 
ayy a alia a - 
a-l “4m Oil J 
(7) 
3 1 1 

r+at+ait+j r+atai+jt+l 

Proof. 


E(X’) = Ipx' foodx, 
1. 2aln(a)x*'(1 — x) (2 - x)" Qe 
= fi x 
(a—1) 


50<x<lja>0;a>1, 
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Expanding the term a“? = yj “(2 — x)¢ In(a))'/i! as a convergent sum of infinite terms, we get, 


E(X°) = ee ae me aa A Q2 x)! yrtarai-t dx, 


_ 2n(aja 


a-1 


l ; +ai— 
yo ( se) in (xttarai-t _ xrtataly (2 _ x)" L 1 
T. 


i=0 


using the expansion of series, 


where, A = a+ ai-— 1 and after simplifying, we get, 


2al © ye CA) yjyasarjaf Ati 
pcr) = Raed So ny genres 4D 


1 1 
x 
aor socio 
The mean, variance, skewness and kurtosis of the distribution of X can now be easily obtained using 
their respective formulae. 
3.2 Conditional moments 
The expressions of conditional moments, can be derived by using the following theorem. 
Theorem 3.2 


BUX" |X >0)= sO ye (nt iy ae ‘Ja-0 
i=0 J 
(8) 


1 1 
x 
[+ a) 


Proof. By proceeding in the same way as mentioned in Theorem 2.1, 
E(XX> 1) = Sx" foods, 


2aln(a)\x"\(1 — x) (2 — x)" a2" 


= fix’ dx, 
(a=) 
2al i Hy 
= a n(a) =A ( og(a) ) fa x) (2 Rel greta dx, 
a-1 : i! 
_ 2aln(a) 


l 
a— 1 i=0 \ ao) ip (x rtatai-l __ sg tera) (2- 5 as ldx 
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: eG “ (ntayy PE Cas a ' 
E(X' |X 1)/2 1- 
(X" |X >t) = ee a OD ( ;  ja-a 


1 1 
{—-—_) 
3.3 Quanitile function 
The p” quantile function Q(p) is obtained by solving F(Q(p)) = p. Hence from equation (3), we get, 
(O(p))’— 2Q(p) + (log, 1 + p(a— 1)))" (9) 


This equation can be further solved using the quadratic formula or by numerical methods. We have also 
explored the changing behavior of the median with varying values of parameters a and a. For this, we put 
p= 0.5 in equation 9 and it becomes, 


(Op) — 2Q(p) + (In(0.5S(a — 1) 


The behaviour of the median with respect to the changing parameters is visualized in Figure 3. 


3.4 Order statistics 


Let X,, X>,..., X,, be a random sample of size n, from the proposed distribution and X.,, < X>., < +++ <Xy-y 
denote the corresponding order statistics. It is well known that the pdf f(x) of r” (for r = 1,2,..., 2) order 
statistics X,..,, when the population cdf and pdf are F(x) and f(x) respectively, is given as, 


fx) = ea “1 [1 Foo} fx) 


_ n!\ 4 (Oo pe 
7 (r—1)\(n— 1)! Di=0 ( ) / (x) f(x) 


Median ne 


0.5 


Figure 3: Variation in median with respect to changing parameters (@,7). 
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and r” cdf F,(x) as, 


rey =2.,(7) Pet —Fe 


see (7\(" Jone 


Thus using equation (3) and (4) the pdf f(x) and cdf F(x) of the r” order statistics based on a random 
sample of size n from the proposed distribution can be easily obtained and is given below as, 


_ ni2ax*"(1-x)(2-x) giant 0 (-1r er? i 
I) (r-l)\(n-r)! De a 0 (l- “(-ay" 


10 
: n—-r\(r+l-1 gs Z 0 (HD) (10) 
l k 


n—r\ (j+1\ (I ge yee 
\ 7 ge) (11) 
1 (l-e): 


and 


HO) = 2 Lohse a 


3.5 Mean deviation 
The mean deviation about the mean is defined by, 
6(X) = Ihe —ulfa)dx, 


where yu is the mean which can be rewritten as follows 6,(X) = bu — x) f(x)dx + I (lx — w) fix)dx. Using 
integration by parts and putting E(X) = iy xf (x)dx = y, it simplifies to 


6,(X) = 2nF(u) — 2p + 2 Jixflxdr, 


where F(-) denotes the proposed cdf. Hence, from Theorem 3.2, 


ji — seine) % cola) ; eau ‘| i 
F(a) = og a ; je 


1 1 
x 
_ —_ 


and thus, 


: ae wh nay pega aT 
5,(X) = 2uF (41) - 204+ ~ (-1)/2 oe ; Je ) 


(12) 


1 1 
x 
hess 1") 
The mean deviation about the median is defined as, 


by (X) = Spx — M fod 
= fy (M—x) fade + Sy — Mfx)ax, 


322 G Families of Probability Distributions: Theory and Practices 


where M stands for median, then after simplification, by putting FV) = . we get, 


5y (X) == +2 Siyxflxddx. 
By Theorem 3.2, 


1 2aln(a) sa = (In(a))' jyar+ai-j-l a+ai-1 
frp ode = 22D" Se CMO py gee [ Je- M) 


. 1 1 
r+atait+j r+atai+jt+l 
Thus, the expression for the mean deviation about the median can easily be written as, 


ena a ya >. Cinta (niger * wv ‘J _M) 


i 


(13) 


1 1 
x 
a te 


3.6 Entropy 


An entropy is a measure of randomness or uncertainty of any system. We derive the expression of Renyi 
entropy (see Renyi, 1961) which generalizes the Hartley and Shannon entropies. Let X have the pdf f(x) then 
Renyi entropy is defined as, 


TAB) = B log[/f#(x)dx] where B>O and B41. 


1_ 
From equation (4) we get, 


Uhre = f | =O 


B 
(—x)(x(2—x)* 1g" dx 
after simplification, 


B 
b sence YY comers 


hence we get, 


ara are er i 


| 08+ eBi-B BY 1 
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3.7 Stress-strength reliability 


The stress-strength reliability has been widely used in reliability analysis as a measure of the system 
performance under stress. In terms of probability, the stress-strength reliability can be obtained as: 


R=P[X> ¥, 


where, X denotes strength of the system and Y denotes the stress applied on the system. The probability R can 
be used to compare two random variables encountered in various applied disciplines. 
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The stress-strength reliability, R for PETL random variables X ~ PETL (a,, y,;) and Y ~ PETL (a), y>) is 
given by, 


R=h) QAO, % yd) fy &% O%, yee 


R= I AY 01, WD) Sy O% OM, Yr) 
Here, we assume that a, = a, and on simplifying, we have, 


Qa a 


_ 2a, In(a) Me +a a a +a, »2(a + aint) a-l (15) 
(a-1)’ z 


2a, In(a) 
2(a, + a,)(-)* 


R 


where, I represents the gamma integral and I’(.,.) represents the incomplete gamma integral. 


3.8 Identifiability 


A family of distributions is said to be identifiable in parameters if the distributions of two members of the 


family are equal, 1.e., f(x, ©,) =f5 (x, O,), then ©, = ©, for all x. Theorem 1 of Basu and Ghosh, 1980, states 
A\@, 0)) 
Ai, 9) 
converges to 0 or diverges to 0, as x — a. For the PETL distribution, we have, 


that the density ratio , of two distinct members of the family defined on the interval (a,b), either 


lim fi (x, a, “) — a, In(a, ) a -1 (x(2 3) ee Qh Pa geC-xa 
90 f,(%,@,,%)) A, In(a,) a, -1 ° 


0 ifa,>a, 16 
lim S\(%, 1,54) = ( ) 


«© ifa, <a, 
20 f(x, @,0,) 


1 ifa, =a, 


Thus, the parameters of the PETL distribution are indentified since two members of the family have 
different densities for different values of a. 


3.9 Stochastic ordering 


Arandom variable X is said to be stochastically greater (X>,, 
manner, X is said to be greater than Y in the 
* hazard rate order (X >,,. Y) if hy (x) = hy (x) for all x. 


* mean residual life order (X=,,,,, Y) if my (x) = my (x) for all x. 


fy) 


* likelihood ratio order (X =,. Y) if ——— is an increasing function of x. 


x 
Theorem 3.3 Let X ~ PETL (a,, a,) and X ~ PETL (a), ay). Then, we have the following conditions: 


Y) than Yif Fy (x) = Fy (x) for all x. Ina similar 


1. For a, = a) =a and a, = a, (X=), Y), (X27, Y), (X=), Y) and (X=,, Y) for all x. 
2. For a, = a, =a and a, = a), (X=; Y), (X 2 ny, Y), (X =), Y) and (X=,, Y) for all x. 
Proof. The likelihood ratio of RV X and Y is given by, 


ty (x; a&,a ) _ Qa In(a, ) (x(2 x))” —a2 qQh®ema qh ema 
2 
fy (%3@,a,) a In(a,) - 
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We first take a, = a, = a and differentiate the likelihood ratio with respect to x. It gives, 


d fe (05054 ) _ an, (x(2— x)? gO a1 g=G—ar 


GE FANG)! ls x(2—x) an 
x[In(a)ar, (x(2 —x)) J In(a)a, (x(2 x)?" 
Bh eey), 


For a) > a, ) > 0, for all x. Clearly, the likelihood ratio is an increasing function of x. Hence, 


dx fy (X; ay ¥) 
for a, = day =a, (X=, Y). 
By Shaked and Shanthikumar, 1994, (X >, Y) >(X =), Y) (X21, Y) and (X>,, 


=mlr 


Similarly, for a; > a, = a we have, 


d{ f,(5@,a,) )_ 2aln(a,)(a, -1) Un(a,)-In(a,))(1-x)(@(2-x))* ahr" (18) 
adx\ fy (x5 a, a,) In(a, )(a, —1) x(2—x) agea 
d +O, : . oo . . é. 
For a) > @, —— (AG @ 1), > 0, for all x. Clearly, the likelihood ratio is an increasing function of x. Hence, 
dx fy (x; @ y2) 


for a, = a, =a, (X=), Y). 


By, Shaked and Shanthikumar, 1994, (X >), Y) >(X =), Y) (X 2, Y) and (X2,, Y) . 


3.10 Ordinary differential equations for density and survival functions 


We provide the first order differential equations of density and survival functions of the PETL distribution. 

The first order derivative of the pdf is, 

2a In(a)(x(2 — x))* (l-—-x)’ + (2a—-1)(1—x)? -1 
x(2—x)(1-x) 


I (x)= (74 mt) \x( ay = xj" 
a 


Thus, the first order ODE for the density function is, 


, | 2a In(a)(x(2—x))* (1-x) +(2a-11—-x) -1 
2) x(2—x)U-x) . 


=0 (19) 


d 
where, vy = f(x) and Y’= He) 
dx 


. For some parameter values, first ODEs of the pdf are given in Table 1. 


((2—))*-1 
The survival function of the PETL distribution given by z = S,{x) = 1 ba a On differentiating it w.r.t. 
a- 


x, we have, 


_ —2aIn(a)(1 — x)(x(2— x))* gern 
(a-1)x(2-x) 


z! 


Thus, on simplification we have, 


z'+ 2aln(a)(1 —x)log,[(1 —z)(a— 1)+ 1][(. —2z)(a— 1)+1] = 0 ae 
where, z = S\(x) and z’= AS (x) 
given in Table 2. dx 


. For some parameter values, first order ODEs of the survival function are 
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Table 1: Ordinary Differential Equations of PETL density. 


a|\a First Order ODE 


|? yr} 2in2)(x(2 = 9)(1 xy +(l-x)?-1 = 
. x(2-x)(l-x) ° 

a yr] Aln2y((2 x) d—x)? +3(1-x)* -1 _ 
a8 —oul—3) ; 

3 | 2 61 3 ae .\2 
; n(2)(x(2—x)) (1— x)" +5(1—x)* -1 
[ea 

x(2-—x)(1- x) 


Table 2: Ordinary Differential Equations of the PETL survival function. 


First Order ODE 
z'+ 2In(2)(1 — x)(2 — z)log, (2—z) =0 
z'+ 4In(2)(1 — x)(2 — z)log, (2—z) =0 
z'+ 6In(2)(1 — x)(2 — z)log, (2—z) =0 


WlNTRe|Ts 
NIN] w]s 


4. Maximum likelihood estimation and simulation 
4.1 Point estimation 


Maximum likelihood estimates of parameters and of the proposed distribution are obtained by maximizing 
the logarithm of the likelihood function. The logarithm likelihood function is, 


In(a) 


log L wigs +nlog(a)+ >" log(l—x,)+(a-1) 
(a- 1) ae 


(21) 
x 7" log(x,(2-x,))+log(a) >)" (x,(2—4,))* In, (2-x,)). 


Differentiating it with respect to the parameters we get, 
dlogL Le 
da 


dlogh | a—alog(a)-1 , — 
ae zs 19 x’). 


After equating these equations to zero, we get two non-linear equations. Solving these simultaneously, 
we get MLEs @ and a4 of parameters a and a respectively. It may be noted here that these equations cannot be 
solved analytically. However one can use some numerical technique for their solution therefore, we propose 
the use of the Newton Raphson method. For the choice of an initial guess the contour plot technique is used. 


= ae log(x,(2—x,))+In(a)))", (x)(2—x,))“log(x,(2-x,)) and 


4.2 Asymptotic confidence intervals 


Forlarge samples, wecan obtain the confidence intervals based on the diagonal elements of the Fisher information 
matrix /-'(4, 4) which provides the estimated asymptotic variance for the parameters and respectively. Thus 


the two sided 100(1 — f) confidence interval of and can be defined as & + Z,,,./var(&) anda+Z,,../var(a) 
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respectively where denotes the upper point of standard normal distribution. The Fisher Information matrix 
can be estimated by, 


-d’ log L -d’ log L 
dar dada 

—d’ log L -d* log L 
dada da’ (8,a) 


1(G, a) = (22) 


where, 
d’logL -n n x 
ad log(a) )""_ (x,(2—x,))*Tlog x,(2—x,)P 
d°*logL _ a’ (log(a)) + (-a’ +2a—I)log(a)-a°+2a-1 1 > (x (2—x,))" 
dar (ala (log(a))° — 
d* logL ms 


= Ons 12%)" lost, (2—¥) 


daa 


4.3 Random number generation 


The steps to generate random numbers from the proposed PETL(a,a) distribution are, 
1. Select n,a and a. 


2. Generate a standard uniform random number, u ~ U(0,1). 


3. Using the quantile function, compute x = 1— \2 - {log, (l+u(a- yy" : 
4. Repeat steps 2 and 3, 7 times to get a sample of size n, {x), X5,..., x,} from PETL(a,a). 


Illustration: 


1. We fix n= 10, a=2 anda=3. 
The random sample generated is: {0.5589, 0.4685, 0.8239, 0.8712, 0.4002, 0.5428, 0.6220, 0.8125, 0.2229, 
0.5197}. 

2. We fix n= 20, a=0.5 anda=1.5. 
The random sample generated is: {0.7579, 0.2430, 0.7586, 0.0027, 0.5100, 0.5114, 0.1640, 0.0124, 0.5053, 
0.0049, 0.0014, 0.6928, 0.0062, 0.0211, 0.6638, 0.6572, 0.0193, 0.0399, 0.3028, 0.4389}. 


4.4 Simulation study 


We compute MSE and Mean Absolute Bias of the MLEs on the basis of 10000 simulated samples for the 
given values of parameters and sample size n. The MSE and Mean Absolute Bias (AB) are computed using 
the following formulae, 


am 1 10000. P 1 10000, 
MSE(@)=———)" (4, -a)", AB(@)=——) | a, -a|, 


10000 j=l 10000 j=l 
_ 1 10000, 5 7 1 10000. 
MSE(6) =D (a,-@) ABB) = Ds |a,-a|. 


We present the results of the simulation study for parameters, (a,a) = (3,1.5) in Table 3 and the results 
are visualized in Figure 4. 

It can be seen that the MLEs are consistent since the MSEs of &@ and ddecrease with increasing sample 
size. The mean absolute bias also decreases as the sample size increases. 


An Extension of Topp-Leone Distribution with Increasing, Decreasing and Bathtub Hazard Functions 327 


Table 3: MSE and Mean Absolute Bias (AB) for (a,a) for simulated samples. 


(4,4) = (3,1.5) 
Sample Size MSE (4) AB(@) MSE(@) AB(@) 
50 1.0764 0.8013 28406.9643 19.0250 
100 0.5616 0.5797 17420.9930 12.2759 
150 0.3631 0.4640 14402.0671 8.0635 
200 0.2595 0.3969 5507.2345 3.8488 
250 0.2075 0.3565 3556.9818 2.7741 
300 0.1694 0.3222 4811.6824 2.6653 
350 0.1392 0.2952 5019.0465 1.5187 
400 0.1254 0.2797 592.9395 0.8561 
450 0.1063 0.2575 55.3961 0.5489 
500 0.1001 0.2501 1737.2620 0.9461 
MSE —— MSE —— 
AB —=—S AB = 
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Figure 4: MSE and Mean Absolute Bias of & and dfor simulated samples with a = 3 and a= 1.5 


5. Real data fitting 


In this section, we analyze a real dataset in order to illustrate the good performance of the proposed PETL(a,a) 
distribution. The data set that is considered here is given by Linhart and Zucchini, 1986, and it represents the 
failure time (in days) of the air conditioning system of an airplane as: 0.46, 0.46, 0.46, 0.5, 0.58, 0.58, 0.58, 
0.67, 0.67, 0.83, 0.87. The initial values of the iterative algorithm are: a = 5 and a = 1.65. The MLEs of the 
parameters of the PETL distribution are: @ = 5.3064 and a =0.0017. 

Figure 5 gives the graphical representation of the fitting of different models that are considered over the 
given data set (refer Sharma, 2020 and R Core Team, 2013). The K-S statistic for the fitting is 0.2727, with 
p-value 0.8071. These figures suggest that the PETL distribution is very suitable for this data. Further, we 
compare the goodness-of-fit statistics with Beta, Kumaraswamy, unit-Gamma, Topp-Leone and Generalized 
Topp-Leone (given by Shekhawat and Sharma, 2021) distributions which are also defined on the unit interval. 

We use Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) to identify the 
best possible model for the given data set. The model that has the smallest values of AIC and BIC statistics 
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Figure 5: Fitting of different models on the given dataset. 


is considered to be the best possible model among the distributions under comparison. The statistics are 
computed by, 


AIC = 2k — 2log(L) 
BIC = Klog(n) — 2log(L) 


where, k is the number of estimated parameters, L is the maximum value of the likelihood function and n is 
the number of observations. 

In Table 4, we show the MLEs and goodness-of-fit statistics for the PETL, Beta, Kumaraswamy, unit- 
Gamma, Topp-Leone and GTL (given by Shekhawat and Sharma, 2021) distributions for the considered 
dataset. We also present the K-S statistic, corresponding p-values and the values of the maximum likelihood 
functions. From Table 4, we observe that the proposed Power Exponentiated Topp-Leone distribution is 
statistically fitted for the considered dataset and has the smallest AIC & BIC along with highest log(L) value 
among all the distributions. Therefore, we recommend the use of the PETL distribution for modeling the 
considered dataset over some other existing distributions. 


Table 4: MLEs, AIC, BIC, K-S statistic and p-values for fitted models. 


Comparison of Distributions 
Model log({L) MLEs AIC BIC K-S statistic | p-value 
PETL (a,a) 6.6380 (8.4832,0.0614) 9.2761 —8.4803 0.2727 0.8071 
Beta (a,f) 6.0499 (6.6369,4.2526) 8.0998 7.3040 0.2727 0.8079 
Kumaraswamy (a,/) 5.6788 (4.1107,4.5691) -7.3577 6.5619 0.2727 0.8079 
Unit-Gamma (a,/) 6.0774 (4.2104,8.0081) 8.1549 -7.3591 0.9090 0.0002 
Topp-Leone (a) 6.0774 (5.0433) 8.8614 8.4635 0.2727 0.8079 
GTL (a,f) 6.3733 (0.0220,6350.2850) | —8.7467 —~7.9509 0.2727 0.8079 


6. Conclusion 


A two parameter extension of the J-shaped Topp-Leone distribution called as Power Exponentiated Topp- 
Leone (PETL) distribution is introduced for a possible application to model failure times (in days) of the 
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air conditioning system of an airplane. The proposed distribution has increasing, decreasing and bathtub 
shaped hazard functions. The expressions of ordinary moments, conditional moments, quantile function, 
mean deviation, order statistics and entropy are discussed. Other important properties of the distribution such 
as identifiability, ordinary differential equations, stochastic orderings and stress-strength reliability are also 
discussed. 

The estimation techniques for the parameters are also discussed. The simulation study that was conducted 
proved the consistency of the ML estimators of the parameters. Further, an algorithm for the generation of 
random samples from the proposed distribution is also given to facilitate future studies. 

According to the various model selection criteria, AIC & BIC and KS goodness-of-fit test the proposed 
PETL distribution is a better model for fitting the maximum flood levels of Susquehenna River data over 
the Beta, Kumaraswamy, unit-Gamma, Topp-Leone and Generalized Topp-Leone (given by Shekhawat and 
Sharma, 2021) distributions. Summing up, it can be concluded that the proposed PETL distribution can be 
effectively used for modeling real life data defined on the unit interval. 
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Chapter 21 


Testing the Goodness of Fit in 
Instrumental Variables Models 
Shalabh* and Subhra Sankar Dhar 


1. Introduction 


An important application of any statistical modeling is that the fitted models are used in different applications. 
For example, the fitted statistical model is used to make various types of forecasts. The success of such 
applications depends upon how good the model is, i.e., how the model is well fitted the given set of data. 
Only a good fitted model can provide valid results in further applications. The goodness of fit of a model 
is usually judged by a statistic which itself is a random variable and its value is computed on the basis of a 
given data set. This estimated value reflects the value of the population parameter responsible for measuring 
the goodness of fit. So different samples will generate different estimated values of the parameter. Moreover, 
in order to further validate the statistical inferences, the testing of hypothesis concerning the parameter being 
estimated is also required. Only estimating the statistical parameter related to the goodness of fit may not 
suffice. 

Considering the set up of a multiple linear regression, the coefficient of determination, popularly known 
as R?, is used to judge the goodness of fit of the model based on the set of observations on a study variable 
and a set of explanatory or independent variables. The goodness of fit through R? is measured by estimating 
the squared population multiple correlation coefficient between the study and explanatory variables. The 
R? is based on the ordinary least squares (OLS) estimator of regression coefficients and is a consistent 
estimator of the squared multiple correlation coefficient. Hence the test of hypothesis concerning the squared 
population multiple correlation coefficient between the study and explanatory variables is conducted based 
on the distribution of R?. The suitable test statistics and test procedures are available in the literature, see, 
Anderson (2003, Chap. 4). 

Note that the OLS estimator (OLSE) is the best linear unbiased estimator of the regression coefficients. 
Hence the coefficient of determination is expected to give good results under the assumptions of the multiple 
linear regression model. In real data analysis, one or more assumptions of the multiple linear regression 
model are often violated. Suppose the assumption that the explanatory variables and the random errors are 
statistically independent is violated. Such a violation is possible when the explanatory variables are stochastic 
which happens under different econometric models, e.g., errors-in-variables or error measurement models, 
simultaneous equation models, time series models and more. When the explanatory variables and the random 
errors are correlated, the properties of the OLSE to remain the best linear unbiased estimator of regression 
coefficients are lost. In fact, the OLSE becomes not only biased but also an inconsistent estimator of the 
regression coefficient under such a violation of assumptions. Consequently, the coefficient of determination 
also becomes an inconsistent estimator of the squared population multiple correlation coefficient, see Cheng 
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et al. (2014, 2016). Under this situation, the hypothesis testing related to the squared multiple correlation 
coefficient will be based on an inconsistent estimator of the multiple correlation coefficient which may pro- 
vide invalid and erroneous statistical inferences. A natural question arises on how to check the goodness of 
fit in such a situation when such assumptions are violated and to conduct the hypothesis test. 

The instrumental variable (IV) estimation provides a consistent estimator of the regression parameters 
under the multiple linear regression model when the explanatory variables and the random errors are not 
statistically independent, and consequently, we expect that it may provide consistent tests for the relevant 
hypothesis testing problems. The IVs are a set of variables which are highly correlated with the explanatory 
variables, at least in limit and uncorrelated with the random errors, at least in limit, see Bowden and Turking- 
ton (1984), Wansbeek and Meijer (2000, Chap. 6) and others, for an interesting exposition on IV estimation. 
It is important to note about the goodness of fit that it is a value obtained by computing a statistic based on a 
data sample and the it is obtained through the estimation of a corresponding relevant population parameter. 
The population parameter in this case is the squared multiple correlation coefficient between the study and 
explanatory variables in the multiple linear regression model. Nevertheless the validity of statistical analysis 
and inferences remains incomplete without the hypothesis test. Measuring the degree of goodness of fit is 
the fundamental requirement in any model fitting but models fitted using the IV estimation posses more chal- 
lenges. So we address the important questions, how to check the goodness of fit in the [TV model and then 
how to conduct the test of hypothesis for the squared multiple correlation coefficient based on IV estimates. 

It is evident from the developments in the area of IV estimation that such issues are very pertinent to a 
user in real life applications. For example, the choice of IV is not unique and different choices of IVs provide 
different models for the same data. The goodness of fit statistic and its related hypothesis test help in choosing 
the appropriate IVs to provide a better fitted model. For example, consider the three popular techniques to 
choose the IVs, see Rao et al. (2008, Chap. 4, pp. 208-209). The Wald instrument technique divides 
the observations on the explanatory variable into two groups and chooses the IVs as +1 and -1 for the two 
groups. Similarly, the Bartlett instrument technique divides the observations on the explanatory variable into 
three groups and choose the IV as +1, 0 and -1 for the upper, middle and lower groups respectively, and the 
Durbin instrument technique uses the ranks of the observations on the explanatory variable as instruments. 
The question is now how to decide which choice will give a better fitted model? A goodness of fit statistic 
is needed to answer such questions and to decide which choice of IV provides a better fitted model. More 
complicated situations arise when more than one choice of IVs are used for the same explanatory variables 
in multiple linear regression model, then what choice will yield a better model is a question in which an 
experimenter will always be interested. The goodness of fit statistic and relevant hypothesis test based on IV 
estimation can answer such queries. 

The issue about how to measure the goodness of fit in the IV model is addressed in Dhar and Shalabh 
(2021) and a goodness of fit statistic is obtained which is based on the use of IV estimators but there is no 
knowledge available on how to conduct the hypothesis test. In this context, the proposed goodness of fit 
statistic consistently estimates the squared multiple correlation coefficient, which measures the goodness of 
fit of the model based on IV estimation using a data set on the study and explanatory variables. Now, one 
may be interested in testing the significance of the squared multiple correlation coefficient. For example, 
testing the significance that the squared multiple correlation coefficient equals any given value or more than 
any given value will shed more light on the status of goodness of fit. Such tests can be carried out using 
the goodness of fit statistics based on IV estimation. How to address this issue is thoroughly studied in this 
chapter in Section 5. 

The plan of the paper is as follows. We consider a general setup of the multiple linear regression model in 
which the covariance matrix of the random errors is assumed to be unknown. The set ups of multiple linear 
regression models and IV estimation are described in Section 2. A motivation to develop the goodness of fit 
statistic is presented in Section 3. The development of goodness of fit statistics under such a set up is briefly 
presented in Section 4 from Dhar and Shalabh (2021) for the sake of completeness and better understanding. 
A consistent hypothesis test for testing the significance of squared multiple correlation coefficient is devel- 
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oped in Section 5 followed by some conclusions in Section 6. The proof of the results are presented in the 
Appendix in Section 7. 


2. Instrumental variable (IV) estimation 


First we consider the multiple linear regression model with an intercept term, 
y=XB+e, (2.1) 


where y is the (n x 1) vector of observations on the study variable, X is the (n x (p+ 1)) matrix of n 
observations on each of the p explanatory variables and an intercept term, 6 is the ((p + 1) x 1) vector 
of regression coefficients associated with the (p + 1) explanatory variables, and € is the (n x 1) vector of 
random errors. The study variable y is linearly related to the p explanatory variables X1,X2,...,X, and 
the first column in X has all the unity elements representing the intercept term. We consider a very general 
framework for the random errors € in terms of their covariance matrix. We assume that E(e) = 0 and the 
random errors are non-spherically distributed, and E(ee’) = o?Q~', where 2 is an unknown positive definite 
matrix. Suppose 2? is consistently estimated by an estimator Q. Such a specification will also allow 2 = I as 
the special case of identically and independently distributed random errors as in the case of standard multiple 
linear regression model. 


We assume that the assumptions of the multiple linear regression model (see Rao et al. (2008, p. 34)) 
are satisfied but the assumption that X and « are uncorrelated, at least in limit, is violated. It is assumed in 
ae 


the usual multiple linear regression model that — *, Dx and As *, 0 as n — 00, where 8 denotes the 


convergence in probability. So here we assume that X and € are correlated in the sense that, 


xX! 
ee (2.2) 


n 


as n — oo where X* is an arbitrary non-zero random variable. Such an assumption holds, e.g., when X is 
stochastic in nature which arises in several econometric models. As the presence of the intercept term in the 
usual multiple linear regression model is needed for the validity of the coefficient of determination, without 
the loss of generality, we also assume the presence of an intercept term in the model for the validity of the 
proposed goodness of fit statistics based on IV estimation which is studied in Section 4. 

It is well known that the OLSE b = (X’X)~'X’y is the best linear unbiased estimator of 6 under the 
standard assumptions of a multiple linear regression model. It remains a consistent estimator of 6 as long 
as X and ¢ are uncorrelated, at least in the limit. When X and « are correlated, at least in the limit, the 
same OLSE b becomes an inconsistent estimator of 3. Consequently, all the model diagnostic tools and 
statistics based on OLSE then may not provide the correct statistical inferences. For example, the coefficient 
of determination is based on OLSE and measures the degree of goodness of fit. It is a consistent estimator 
of the squared population multiple correlation coefficient as long as X and «€ are uncorrelated, at least in 
the limit. The coefficient of determination becomes an inconsistent estimator of the squared population 
multiple correlation coefficient when X and € become correlated, at least in the limit. There can be different 
approaches to solve such issues. One approach is to use a consistent estimator of § in place of b. The 
instrumental variable estimation provides a consistent estimator of @ when X and € become correlated, in 
the limit. 

The instruments in the IV estimation are a set of variables which are highly correlated with the explana- 
tory variables and least correlated with the random errors, in the limit. Suppose 21, Z2,..., Zp is a set of p 
instrumental variables such that they are correlated with X, in the limit and uncorrelated with ¢, in the limit. 
Similar to X, the observations on these instrumental variables are arranged in a (n x (p + 1)) matrix, with 
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an intercept term Zp as Z = (Zo, Z1,..., Zp»). We assume that, 
Z! 
aw (2.3) 
n 
ZX 
4 Uzx >0 (2.4) 
n 
X'X 
4 Uxx >0 (2.5) 
n 
LZ 
*, Yaz >0, (2.6) 
n 


where, “zx, Ux x and Yzz are non-singular positive definite matrices of constants. 
The instrumental variable estimator is obtained in two stages as follows. Consider and express X in the 
set up of multiple linear regression model as, 


X =Zo*+¢, (2.7) 


where, a* is a coefficient vector associated with Z, ¢ is the associated random error term with E(¢) = 0 
and E(¢¢’) = 0?Q~*! where © is an unknown positive definite matrix. Suppose 2 is consistently estimated 
by an estimator 2. The generalized least squares estimate of a* is obtained in the first stage as, 


ty =(Z'0Z)*2’0X 


from (2.7) and is used in the second stage. We obtain the predicted value of X as X=2Z aty = PraX 
where Pz = Z(Z'0Z)—1Z’ in, 
y =Pr4Xp+e. (2.8) 


Applying generalized least squares on (2.8) yields the two stage feasible generalized least squares estimator 
of 6 as, . 
Brea (XP aX) XX Pay. (2.9) 


The IV estimators have been an attractive choice for the theoretical and applied researchers from vari- 
ous perspectives in parametric, nonparametric, semiparametric, Bayesian and frequentist frameworks. An 
important application of IV estimation is in handling the measurement error models which goes back to Sar- 
gan (1958, 1971), Mallios (1969) and Leamer (1978); see also Iwata (1992) and Abarin and Wang (2012). 
The group mean ordinary least squares estimator with a IV estimator is considered in Batistatou and Mc- 
Namee (2008), a generalized IV estimator containing several common methods used in measurement errors 
is discussed in Séderstr6m (2011). The method of moments estimation using IVs in generalized linear 
measurement error models under not necessarily normally distributed measurement errors in parametric and 
nonparametric setups is considered in Abrin and Wang (2012). The IV estimation in an error-components 
model is considered in Amemiya and MaCurdy (1986) and is illustrated under nonlinear measurement error 
models in Amemiya (1990). The IV estimation in nonparametric and semiparametric models is considered 
in, e.g., Park (2003), Newey and Powell (2003), Conley et al. (2008), Chib and Greenberg (2007), Horowitz 
(2011), Carroll (2004) and others. The IV estimation with some applications from an econometrician’s view 
and for causal inferences are discussed in Imbens (2014) and Baiocchi et al. (2014), respectively. The rela- 
tionships between the Bayesian and classical approaches to IV regression in simultaneous equation models 
is established in Kleibergen and Zivot (2003). The bayesian IV estimation is discussed in Wiesenfarth et al. 
(2014), Zellner et al. (2014), Lopes and Polson (2014), Gustafson (2007) etc. The IV estimation in a ran- 
dom coefficient model is studied in Clarke and Windmeijer (2012) and Chesher and Rosen (2014). The IV 
estimation in various other models, e.g., in varying-coefficient models is discussed in Zhao and Xue (2013); 
quantile regression is investigated in Chernozhukov et al. (2007) and Horowitz and Lee (2007); time series 
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models are discussed in Kuersteiner (2001). The application of two stage IV estimation has been extensively 
used in real data, see, e.g., Angrist (1990), Card (1995), Acemoglu et al. (2001), Kim et al. (2011), Fish et 
al. (2010), Davies et al. (2013), Pokropek (2016) and many others. 

Next we briefly describe how the goodness of fit statistics using the IV estimates can be found but before 
that we present some details about the coefficient of determination to motivate the development of goodness 
of fit statistic in TV model. 


3. Goodness of fit in multiple linear regression model 


Consider the multiple linear regression model under the standard assumptions (see Rao et al. (2008, p. 
34)) in which the random errors are identically and independently distributed having mean zero and identity 
covariance matrix as, 07. The coefficient of determination, popularly denoted as R?, is based on the OLSE 
b = (X’X)~!X’y of 8. The R? measures the goodness of fit in the classical multiple linear regression 
model under one of the assumptions that explanatory variables and random errors are uncorrelated. The 
R? is defined as the ratio of sum of squares due to regression to the total sum of squares. It measures the 
proportion of variation in the data explained by the fitted model based on OLSE with respect to the total 
variation. The total sum of squares is orthogonally partitioned into sum of two orthogonal components, viz., 
the sum of squares due to regression and the sum of squares due to errors in the context of analysis of variance 
in the multiple linear regression model, see Rao et al. (2008, p. 57). Such a partitioning into orthogonal 
components is possible only when the explanatory variables and the random errors are uncorrelated. 

Assuming that the explanatory variables and random errors are uncorrelated and XP; X +, Dx, the 
squared population multiple correlation coefficient between the study variable y and explanatory variables in 
X is given by, 


/ 
>») 
see. pega (3.1) 
BiuxB +o? 
where x is a positive definite finite matrix, Py; = I, — tenel,, and e,, = (1,1,...,1)/ isa (nm x 1) vector 


of all unity elements. 

When o? = 0, then 9 = 1 and it indicates that the model is best fitted. On the other hand, if all the 6’s are 
zero, then 8 = 0 which indicates that the model is worst fitted. Similarly, any other value of @ will measure 
the goodness of fit of the model in terms of the squared population multiple correlation coefficient. 

Obviously, the population multiple correlation coefficient is based on unknown parameters and is not 
usable in real data applications. We need a suitable estimator to estimate 6. When explanatory variables 
and random errors are uncorrelated, an estimator of squared population multiple correlation coefficient is 
defined as, 


BX'P)Xb- 


f= st ee a 
y' Pry 
= y' Py X (X'P,X)-1X' Pry 2 
7 y!Pry a) 


which is known as the coefficient of determination and measures the goodness of fit in terms of the ratio of 
sum of squares due to regression to the total sum of squares. The R? measures the proportion of variability 
explained by the fitted model using OLSE with respect to the total variability in the data. On the lines of 
interpretation of 0, the values of R? = 0 indicates that the model is worst fitted and R?2 = 1 indicates that 
the model is best fitted. Any other value of R2 between 0 and 1 will suitably reflect the degree of goodness 
of fit. For example, if R? = 0.7, then the model is considered to be nearly 70% good fitted. 


It is important to note that under the assumptions in multiple linear regression model that XPre 40 


and XE X 4, Dx > Oisa finite matrix, R? is a biased but consistent estimator of 0, i.e., E(R2) ¢ 0 and 
fe *, @, see, Anderson (2003). 
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The concept of coefficient of determination has been extensively studied and extended for various models 
in the literature. For example, the coefficient of determination in entropy form for generalized linear models 
is proposed by Eshima and Tabata (2010, 2011); the logistic regression model is discussed in Tjur (2009), 
Hong, Ham and Kim (2005), Liao and McGee (2003); a local polynomial model is proposed in Huang and 
Chen (2008), a mixed regression model is discussed in Héssjer (2008); multivariate normal repeated measure 
data is presented in Lipsitz et al. (2001); simultaneous equation models are discussed in Knight (1980); see 
also Renaud and Victoria-Feser (2010), van der Linde and Tutz (2008), Marchand (2001), Srivastava and 
Shobhit (2002), Marchand (1997, 2001), Nagelkerke (1991) and more for other developments including 
generalization of coefficient of determination in various directions. 


4. Goodness of fit in instrumental variable model 


Considering the basic philosophy of goodness of fit behind the definition of R?, we formulate a statistic to 


measure the goodness of fit in IV model. It may be noted that R? is based on consistent OLSE b of @ and the 
assumption that explanatory variables and random errors are uncorrelated. When this assumption does not 
hold true in case of IV model, b becomes an inconsistent estimator of ( and the total sum of squares can not 
be partitioned into orthogonal components- the sum of squares due to regression and the sum of squares due 
to error. Also, the IV estimators are not the best linear unbiased estimators of / like the OLSE. A statistic 
for quantitatively measuring the goodness of fit in the IV models is developed in Dhar and Shalabh (2021) 
and is discussed briefly as follows for the sake of completeness and better understanding. We will develop 
the test of hypothesis based on this statistic later in Section 5 
We consider model (2.8) and express the total sum of squares as, 


(Pare =F €)'Pr(Pz6XB +e) 
BX' Pye Pr Pea XB +28 X'PeePre+ é’ Pre. (4.1) 


I 


y” Pry” 


The total sum of squares is partitioned into sum of squares due to regression and due to errors just like the 
case of the multiple linear regression model. So comparing (4.1) with the total sum of squares in multiple 
linear regression model, it can be considered that the first two terms in (4.1), viz., 3’ X 'PoaliPreX B 
and 23’ X'P,« Pye jointly constitute the sum of squares due to regression and ¢’ Pye is the sum of squares 
due to errors. Since 6 and € are unknown in {’ X'Pz6P;P,6X and 2)'X'P,6Pr, so we replace them 
by IV estimates Byy(Q) = Biv = (X'Pz4X)~'X'Pzey* and the corresponding residuals € = y* — 
Pra X B rv (Q), respectively. A statistics measuring the goodness of fit in IV models can then be constructed 
as the proportion of sum of squares due to regression and total sum of squares as, 


= Bry X' P26 PrPzeX Brv 1 2Biy X' Pre Pré 
y*' Pry* 


, 0< Giy(Q) <1 (4.2) 


The statistic (4.2) can be used to measure the goodness of fit in the linear regression model obtained 
through IV estimation. It can be used to measure the goodness of fit in the [TV model with nonspher- 
ical random errors and unknown covariance matrix. It is termed as Goodness of Instrumental Variable 
Estimates (GIVE) statistic in IV models. Note that the presence of the term 25’ X ‘Pek in (4.1) and 
ae. ‘Pye Préy*!Pry* in (4.2) makes it different compared to using R? to judge the goodness of fit in IV 
models. 

Let @ry (Q) be the squared population multiple correlation coefficient between the study and explanatory 
variables in the IV model and is the counterpart of the squared population multiple correlation coefficient 
(3.1), 

BUX oxB 


Irv (Q) = BS oyB +02’ 


0< 4y(Q) <1 (4.3) 
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where, 04.0, = X'PzoP1PzoX and Pzq = Z(Z'NZ)~1Z'. When the model is best fitted, then o? = 0, 
and consequently @;y(Q2) = 1. Similarly when the model is worst fitted, then all the 6’s will be ideally 
zero (or close to zero) indicating that none of the explanatory variables are important for contribution in the 
model, and consequently 6ry(Q) = 0. Any other value of @ry(Q) lying between 0 and | will indicate the 
goodness of the fitted model as measured by the squared multiple correlation coefficient. 


Consider, 
X'P xg ZOE TZ 
EE (~4) (=*) ( “) 2% (4.4) 
nm nr nm nr 
X'Pz9X (XZ wOZ\—" *27'xX 
n 7 n . n , n 
* Megha See SG, (4.5) 


Since (2 is a consistent estimator of ©, under (4.4) and (4.5), we observe that, 


P -1 
Z'PygX AWA LOZ Z'X 
plim—4~2— = plim ( ) -plim ( -plim ( ) 
n n n n 
= SzzSgau hex = Tey > 0 (4.6) 
i -1 
Z'PyeZ VA LOZ L'Z 
plim—42— = plim ( ) -plim ( -plim ( ) 
n n n n 
= = Dzgzh5., haz = Uzz 20, (4.7) 


Hence, G?,,(2) 4 O7y(Q) as n — 00, ie., G?,-(Q) is a consistent estimator of Oy (Q). Here Ory (Q) 
measures the goodness of fit of the model, therefore, an estimate of 6ry (QQ) is also expected to have the 
same interpretation in the fitted model obtained through IV estimation. When o” = 0, then CO) =1 
indicating that the model is best fitted. On the other hand, if all estimated regression coefficients are close to 
zero or say, exactly zero then this indicates that the corresponding explanatory variables are not significant 
meaning thereby that the model is worst fitted, then G7,, (Q) = 0. Any other value of Giy (Q) lying between 
0 and | can be considered as measuring the degree of goodness of fit of model using instrumental variables 
for the given explanatory variables and sample size. For example, if G7, (Q) = 0.95, then it would mean that 
95% of the variation in the values of the study variable are explained by the fitted IV model. Alternatively, 
in simple language, the fitted IV model is approximately 95% good. 


5. Test based on GIVE statistic 


In order to carry out any hypothesis testing of a problem based on the GIVE statistic, one needs to know the 
distribution of the GIVE statistic. In this context, it is indeed true that deriving the exact distributions of those 
G7,,(.) is intractable in most of the cases. Even if it is possible to derive them in a few cases, the expressions 
may be so complicated that they may not shed any light on their behaviour. Therefore, we use the asymptotic 
distributions of G7,,(.) in this section. The results for the following three cases are considered here: (i) when 
the random errors are identically and independently distributed and their covariance matrix is of the form 
a7, (ii) when the random errors are not identically and/or independently distributed and their covariance 
matrix is an off-diagonal known matrix of the form 2, and (iii) when 2 is unknown. The results for case 
(iii) are more general and to concise the presentation, we state the result for (ii1), and derive the results of (i) 
and (ii) directly from the results of (iii). The asymptotic distribution of G3,,(Q) after appropriate normal- 
ization is stated in Theorem |. The other two cases, i.e., (i) and (ii) are discussed in the remark followed by 
Theorem 1. 
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To derive the the asymptotic distribution of the GIVE statistic, we need to assume that the following 
conditions: 
(A1) The parameter space of 3 is compact. 
(A2) X is a bounded random variable almost surely. 
(A3) Z is a bounded random variable almost surely. 
(A4) The random variables Z and u are independent. 
(AS) The random variables Z and € are independent. 
(A6) The correlation between € and u does not equal zero. 
(A7) Let Q = ((Gi,;)), where 1 < i,j < p, and 6;,; > O with probability one for all 7,7, and 


Fi,j — 01,5 = Op (5 =) for all i and j. 
Let a = (a1, @2,...,@q) € R® be an arbitrary d-dimensional vector, 
Vio = (XT Pz9X)71 (XT PeqgQPzoX)((X7 Pz9X)*)71, 
TXT Py9(2(X — Prag X 
g(a) = = Zhe) z0Xa) and 


(Pz9Xa+ u)? Py(PzoXa + u) 


_ (Ag(a) g(a) Og(a) 
Vea) = eee "Bax? *? Oag ) 


Theorem 1 Under conditions (A1)-(A7), {G3 (Q)—Orv (Q)} converges weakly to anormal distribution 
with mean 0 and variance {Vg(a)}* V{V9(a)}la=pry- (2): 


Remark: The proof of Theorem | is provided in the Appendix for the sake of completeness and better 
reading, although it is available in Dhar and Shalabh (2021). Note that for a known Q), i.e., case (ii), the 
result remains the same as long as ( is a consistent estimator of 2. Case (i) is a special case of case (ii), and 
hence, the asymptotic normality of /n(G7, (I) — Orv (Z)) will directly form the assertion in Theorem 1 by 
replacing 2 by J. The readers may refer to Lemma 3 and Corollary | in Section 7 for the precise statements 
of the results for cases (1) and (ii). 

After having the asymptotic distribution of the GIVE statistic, we now want to formulate a hypothesis 
testing problem related to squared multiple correlation coefficient 07y (Q) based on G7 et ). Suppose that 
we want to test, 


against 
Ay : Ory (Q) > 00, 


where, 0 € (0, 1] is is known and specified. In order to test Hp against H;, let us consider the test statistic 
Tn = V/n(G¥,-(Q) — 49). The following theorem describes the large sample property of the test based 
elie ae 


Theorem 2 Let cq be the (1 — a)-th quantile of a normal distribution with mean 0 and variance 
{V9(a)}*V{Vg(a)}|a=s1y(@)» Where a € (0,1). Then under conditions (A1)-(A7), Py,[Tn > Ca] + @ 
asn —> oo, and Py, [Ty > Ca] > lasn > o. 


Theorem 2 asserts that the test based on 7;, can achieve the test level, ~ as n — oo, and moreover, the 
power of the test will converge to one as n — ov, i.e., the the test based on T7, is a consistent test. 


Proof of Theorem 2: First note that Py,[T, > Ca] 4 @ as n — oo, which directly follows from the 
assertion of Theorem | as Ory ({Q) = 09 under Ho. 
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Next, consider under Hy : Ory (Q) = 61, where 6; > 69. Hence, 


Jim Pa, [Tn > Col = lim Pu, [Vn(Giy(Q) — 9) > cal 
= lim Pr, [Vn(Giy-(Q) — 01 + 01 — 90) > ca] 
= lim Pr, [Vn(Giy(Q) — 61) + Vn(1 — 80) > cal 
= lim Pr, [Vn(Giy (Q) — 41) > ca — Vn(O1 — 40)| 


where Z is a random variable associated with a normal distribution with mean O and variance 
{V9(a)}*V{Vg(a)}|a=s;y()» and the last fact follows in view of /n(01 — 99) —> co as n + ov. This 
completes the proof. 


Remark: Note that the similar testing of hypotheses problems like Hj : @ry(Q) = Oo against 
Hy : Ory (Q) < O% and H5* : Ary (Q) = O against HT* : Ory (OQ) A A can also be resolved based on the 
GIVE statistic G7,,(2). In order to test Hj against Hj, one can use the test statistic Th = Vn(9o-GF(Q)), 
and for testing H%* against H}*, one may use the test statistic as T** = \/n|G4,,(Q2) — 9o|. Here it should 
be mentioned that the tests based on T** and T;* will also be consistent. 


6. Conclusions 


We have addressed the issue of hypothesis testing of the squared multiple correlation coefficient based on the 
IV estimation. A statistic, viz., GIVE statistic is formulated to measure the goodness of fit in the [TV models. 
It is very difficult to obtain the exact distribution of test statistics. So using the asymptotic distribution of 
the GIVE statistic, a test statistic for a consistent hypothesis test is derived. The test statistic is quite general 
and it can be used when the covariance matrix of the random errors is a diagonal matrix or any known or 
unknown positive definite matrix. 


7. Appendix: Some technicalities 


We state some results here which are used in the results in the earlier sections. 
First we show that the IV estimator is a consistent estimator of regression coefficients. 


Lemma 1 Biv a Bas n — o, where Bry is the same as defined in (2.9). 
Proof of Lemma 1: Note that, 
(81v -B) = (X'PzgX)*X' Pre (Prq XB +e) = 2 
= CP .A) x Pee. (7.1) 


Using (7.1), we have, 


bum) = (XEmk)* (Eine) 


nm nm 


Js 


(SY x). (72) 
0 


I 


It completes the proof. 
Now we show that the GIVE statistics is a consistent estimator of the squared population multiple corre- 
lation coefficient between the study and explanatory variables. 
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Lemma 2 G?,,(Q) 4 @7y-(Q) as n + 00, where G7y-(Q) and Ory (Q) are the same as defined in (4.2) and 
(4.3), respectively. 


Proof of Lemma 2: We first note that, 


B'X' Pre Pr Pz4XB = BX x26™ 554 =zx0P 
+ BUR ox8. 
Next, consider, 
[Biv X’PraPréa| = Bry X'P a Pry" = PraXBrv)| 


Bry X'PzePrPr9XB| + [Biv X’PeaPre] 


| 
= [Biv X’ Pea Pr(2PqX8 +e—PygX 51v)| 
| 


— [Biv X’PraPrP 2a X Brv] 


* B'SxazDzoz=zaz@zozhz0xBb +0 - B'UxazUzgzUzoxB 
(y Pry”) = (Pog XB + 6)’ Pr(PraXB + €) 
(PX Pa hiPya Xe =F oO X' Pa Pre + é' Pre) 
4 B'S xzadDz70hzx0B +0+ o 
= B'X%oxb+o’. 
Thus, we have, 
ae (BivX ‘Pra PrPz6X Brv + 2BryX Pao Prig) 
Giy(Q) = 


(y*/Pry*) 
P B'XSoxB 

BSR 9x8 +0? 
= Oy(Q), 0< Ory(Q) <1. 


It completes the proof in view of the fact that the first three results of the proof, and (is a consistent estimator 


of 2. 
In order to prove Theorem 1, one needs to prove the following lemma. 


Lemma 3 Under conditions (A1)-(A6), /n{GFy (Q) — Orv (Q)} converges weakly to anormal distribution 
with mean 0 and variance {Vg(a)}* V{V g(a) }la=Sry (0): 


Proof of Lemma 3: To prove this lemma, we first note that, 


Brv (Q)' X' Pza(2y* — PzoX Bry (Q)) 


GIy) = en 
= Bry (9)! X'Pzo(2y* — Pz X Brv (Q)) 
(PzoX Bry (Q) + €)' Pr(PzoX Bry (Q) + ©) 
B'SaxB 


and we have, @ry(Q) = —b-*2*" ,. 
1v( ) Bay Bto? 
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Hence, G?,,(Q) = g(Brv(Q)) and Ory (Q) = g(Brv(Q)), where for any a = (a1,a2,...,aa) € R4 


— &X'Pz0(2(Xate)—PzaXa) ; . Ra 
(d > 1), 9(@) = Tsay PB Prakaty bee g RY SR. 


Now, Proposition 2.27 of van der Vaart (1998) implies that /7(87y (2) — 8) converges weakly to a 
normal distribution with mean 0 and variance V, where, 


V = (X'Pz9X)71(X' PzqQPz9X)((X'Pza XJ)" 


Therefore, by the delta method (see, e.g., van der Vaart (1998)), one can conclude that 
Vn(GFy(Q) — Arv(Q)) converges weakly to a Gaussian distribution with mean 0 and_ variance 


{Vg(a)}/V{Vg9(a)}|a=8;y- (a), where, Vg(a) = (40 rer ag(a)), which completes the proof. 


Oa, ? Oag ? 


Proof of Theorem 1: Note that G?,,(Q) = Brv (@)'X" Pq 2u"—PegXPO) ang 


y*IPry* 
2 _ Brv(Q)'X' Pza (2y*—PzeX B(Q)) 
Gry (Q) = yi Pry™ 


, hence, 


Vit Gy (®) — Orv ()} 
Vi iy (O) - GIy) 
= VitGiy(® - Gy (Q) 


+ Gy (Q) — Orv (Q)} 
b+ Vn? v(Q) — Orv (Q)} 


The assertion in Lemma 3 implies that /n{G7,,(Q) — Orv (Q2)} converges weakly to a normal distribution 
with mean 0 and variance {Vg(a)}7V{Vg(a)}|a=s,y, (2). SO, we now check if /n{G?,, (0) — G3, (Q)} 4 
0 as n — oo or not. If this holds true, then the proof is completed. 

Further, note that Bry (Q) = (X'Pzq.X)7!X' Pray”, Bry (Q) = (X'PzoX)1X'Pey®, and Q isa 
consistent estimator of (2. 

Using these expressions and aforementioned fact, we have, 


vn{ Gy (Q) — Giy(Q)} 
vn 


= PP roo )'X"(Pze — Pza)2y* + Bry (Q)' X' Pzo2y™ 


+ {Bry (Q)} — Brv(Q)X'{Pzq — Pzo}2y"]. 


For a detailed derivation, the readers may look at Dhar and Shalabh (2021). 
Now, note that since the parameter space of 3 is compact (see (A1)), and 6;,; — 04,3 = Op (+) for all 2 


and 7 (see (A7)), we have \/n(Pz6 — Pza) = Op (=) . Moreover, since {Bry (Q)} — Bry (Q)} = 0, (1), 
and X and Z are bounded random variables, we have, 


Vn[Brv (Q)'X' (Pze — Pzo)2y* + Brv(Q)'X' Pza2y* + {Brv (Q)} — Brv (QYX' {Pe — Pzo}2y"] ne 
y*!Pry* + op(1) 


as N —> 00. 
Thus the condition holds and along with the fact of Lemma 3, it completes the proof. 


Corollary 1: Under conditions (A1)-(A6), \/n{ G71, (I) — Orv (1)} converges weakly to a normal distribution 
with mean 0 and variance {Vg(a)}7V{V g(a) }|a=8ry- (02In)+ 


Proof of Corollary 1: The proof follows using the same arguments as the proof of Lemma 3. 
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Chapter 22 


Probability Distribution Analysis for 
Rainfall Scenarios 
A Case Study 


Mad Abdul Khalek,'* Md Mostafizur Rahaman,' Md Kamruzzaman,’ 
Md Mesbahul Alam' and M Sayedur Rahman! 


1. Introduction 


Bangladesh, like many other countries, is expected to observe variations in climatic variables over time. 
Estimating the amount of water available to suit the diverse needs of agriculture, industry, storm water 
design, and other human activities requires a thorough understanding of rainfall. However, knowing the 
nature or characteristics of these factors is critical for making it easier to find concealed data that could 
have substantial procedure ramifications in a country’s short and long term. A country’s climate is directly 
influenced by its geographical location and physical circumstances. Bangladesh is surrounded on the north 
by the Himalayas and on the south by the Bay of Bengal. Its latitude ranges from 20°34’N to 26°38'N and 
its longitude is from 88°01’E to 92°41’E. Our country receives an average of 2200 mm of rain per year, with 
yearly rainfall ranging from 339 mm to 5000 mm. In NW Bangladesh, the months of June to September often 
receive 80 to 85 percent of the total normal annual rainfall. An appropriate distribution for rainfall data is a 
task that involves selecting a probability distribution for model variables and estimating selected distribution 
parameters under the conditions of effective capability and effective decision, which usually necessitates 
defining quality-of-fit valuation. 

According to statistical theory, the frequency of extreme rainfall occurrences is more dependent on 
changes in variability (generally, the scale parameter) than on the climate mean (Al Mamun et al., 2021; 
Alamgir et al., 2020; Katz and Brown, 1992). Rainfall probability distributions have been investigated by a 
number of scholars. The Gamma distribution was used to document the probability distribution of rainfall over 
monthly and annual periods (McCuen, 2016; Maity, 2018). The LN2 distribution is the best-fit probability 
distribution for India’s annual maximum rainfall, according to Kumar (2000), Tabish et al. (2015), and Singh 
(2001). Amin et al. (2016) discovered that the log-Pearson type II (LP3) distribution was found to be most 
suitable in the northern Pakistan annual maximum rainfall data on a daily sample. 
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According to Kwaku et al. (2007), the LN2 distribution was the best fit for one to five consecutive days of 
maximum rainfall in Accra, Ghana. According to Olofintoye et al. (2009), Al Mamoon and Rahman (2017), 
Alam et al. (2018), and Ogarekpe et al. (2020), 50 percent of stations in Nigeria follow LP3 distributions and 
40 percent follow Pearson type III (P3) distributions for peak daily rainfall. The annual maximum rainfall 
distribution was used to study the annual maxima of daily rainfall in five locations in South Korea from 1961 
to 2001, and it was discovered that the Gumbel distribution offered the most reasonable distribution in four of 
the five locations analyzed (Kamal et al., 2021; Mohsenipour et al., 2020; Nadarajah and Choi, 2007). Deka 
et al. (2009) used only five extreme worth distributions to determine the most suited probability to define 
the annual legacy of maximum rainfall for 42 years at nine remote locations in north-east India. The normal 
and gamma distributions were found to be the best-fitted probability distributions for annual rainfall data for 
fourteen Sudanese rainfall stations from 1971 to 2010. (Mahmud et al., 2020; Mahgoub et al., 2017). 

Sharma and Singh (2010) used the least squares method to study the daily maximum rainfall data of 
Pantnagar, India, which was well-known, for a period of 37 years and the best-fitted probability distribution 
among the sixteen compared distributions. The log-Pearson type III distribution fitted for 50% of the total 
stations was used for the rainfall distribution characteristic on the Chinese plain, according to Lee (2005). The 
log-person type III distribution, according to Ogunlela (2001), best pronounces the stochastic investigation 
of peak daily rainfall. Choi et al. (2021) and Baskar et al. (2006) found that the gamma distribution was 
more appropriate than the other distributions in the frequency analysis of successive days peaked rainfall at 
Banswara, Rajasthan, India. 

It is critical to assist planners in forming vital policy decisions by providing information on proper rainfall 
distribution in various locations of Bangladesh. Islam et al. (2021) and Rahman and Lateh (2016) investigated 
the assessment of Bangladesh’s climatic attributes based on standard rainfall series and geographic data 
for the period 1971-2010. Hossian et al. (2016) fitted many forms of probability distributions for monthly 
maximum temperature in Dhaka, Bangladesh, using several types of distributions for climatic factors, with 
the skew logistic distribution proving to be the most acceptable. Ghosh et al. (2016) and Khudri and Sadia 
(2013) developed the generalized extreme value distribution, which provides the best-fitted distribution for 
monthly rainfall data in Chittagong, Rajshahi, Sylhet, and Dhaka. 

There are primarily three objectives in this study: (1) describing the nature of rainfall at three different 
locations in northwest (NW) Bangladesh, namely Bogura, Ishurdi and Rajshahi, (2) fitting probability 
distributions of maximum monthly rainfall and (3) determining the best-fit distribution of seasonal and 
annual rainfall for each location. In this analysis, we will implement the following steps—data collection, 
exploratory analysis, detection of outliers, trimming in the particular fitting feasibility measurement . In 
addition to taking the statistics into account, and summarizing them, a number of descriptive methods are 
employed to set forth the nature of the statistics, or to fit probability distributions for maximum rainfall for 
altered sites, to examine the truthfulness of the best-fit distribution, or to assess the best-fit distribution based 
on the Akaike Information Criterion (AIC), or to use the Bayesian Information Criterion (BIC). 

The study includes several sections, including this one. In the second section, we review the theory 
and methodology of different techniques related to fitting probability distributions. We have discussed the 
basic concepts of probability distributions in this section. For the fitted implemented potential probability 
distribution, various goodness-of-fit tests have also been estimated in this section. Finally, a brief summary 
is provided and some areas for further study are suggested. 


2. Methods and materials 

A common probability distribution which is also the best fit is applied in this section to analyze the rainfall 
data of a location. 

2.1 Data source 


This research is based on maximum annual, seasonal and monthly rainfall. The study area for this study is 
NW Bangladesh, which is located adjacent to Bangladesh with coordinates 20°34’N to 26°38’N latitude and 
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Figure 1: Location of rainfall stations used in this study; Source: Author. 


88°01’E to 92°41’E longitude. The area consists of three meteorological stations (namely, Bogura, Ishurdi 
and Rajshahi) which were critically analyzed. These three stations were selected based on their geographical 
locations, which were expected to represent the characteristics of NW Bangladesh as a whole. The names 
and locations of rainfall stations are shown in Figure |. For achieving the purpose of these investigations 
numerical analysis has been performed, based on secondary data. For reaching the objective, 55 years (1964— 
2018) of climatic indicator (monthly maximum rainfall) data for three selected stations was collected from 
the Bangladesh Meteorological Department. 


2.2 Fitting probability distributions 


In the article, we fit and compare the performance of maximum rainfall using six different distributions 
include 2-parameter log-normal (LN2), Exponential (Exp), Weibull (W2), Pearson type III (P3), log-Pearson 
type III (LP3) and Generalized Extreme Value (GEV) distributions. The advantages of these probability 
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distributions make it easier and more popular to analyze the frequency of extreme events (Li et al., 2015). The 
parameters of the candidate distributions are estimated by the maximum likelihood method. In this study, the 
average weighted distance (AWD) technique is used, Kroll and Vogel (2002), for measuring the differences 
between sample and theoretical L-moment ratios. The AWD technique is defined by: 

N 


Vind, 


AWD =—+ here, d; = difference of two theoretical L-moments 


A distribution with the smallest AWD value will provides the best fit for the sample data. 


Table 1: Probability distributions considered in the survey. 
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2.3 Goodness-of-fit test 


The goodness-of-fit test examines the validity of an estimated probability distribution model for the rainfall 
data. Graphical methods, numerical methods and formal normality tests are three techniques of checking 
the normality. The empirical distribution function (EDF) and the normality tests measure the discrepancy 
between the empirical and theoretical distributions (Dufour et al., 1998). 

Kolmogorov-Smirnov (K-S) test, Anderson-Darling (A-D) and Cramer Von Mises (CvM) tests are 
the most popular EDF tests that were applied in this study (Arshad et al., 2003; Seier, 2002). Q-Q plot, a 
graphical test, and the root mean square error (RMSE) examined the best fitted model. AIC and BIC, and, the 
log-likelihood tests, were applied to compare the observed and estimated values (Table 2). 
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Table 2: Goodness-of-fit statistic. 
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3. Results and discussions 


The present study determines the maximum monthly rainfall of each station by using the best fit probability 
distribution. These assumptions provide valuable direction in policy making and making proper judgments. 


3.1 Data description 


The simple statistical features of the monthly maximum rainfall for each station are considered. The coefficient 
of skewness is used to ensure an asymmetrical level of distribution around the average. The summary statistics 
mean, standard deviation (SD), skewness, kurtosis, coefficient of variation (CV), maximum and minimum 
values of monthly maximum rainfall are presented in Table 3 where, the annual mean of maximum rainfall 
is 395.312 mm. The seasonal means of maximum pre-monsoon , monsoon, post-monsoon and winter rainfall 
are 267.352 mm, 862.364 mm, 211.182 mm and 29.429 mm respectively. The maximum monthly rainfall in 
a year is 2013 mm and monthly maximum monthly rainfall in the monsoon season varies from 763 mm to 
2013 mm. From the result, rainfall data for all seasons and stations showed positively skewed distributions. 

The values of kurtosis in Bogura and Rajshahi stations are 1.299 and 1.281 respectively, which is less 
than 3 but in Ishurdi station it is 4.589. In pre-monsoon and winter seasons opposite results are observed 
as given in Table 3. Monthly rainfall data was skewed positively. It showed strong positive skewedness for 
Rajshahi station. It was leptokurtic in Bogura and Rajshahi stations and platykurtic in Ishurdi station. There is 
almost no rainfall, especially in the dry season, and in winter, and so the study area’s data showed a platykurtic 
distribution. The maximum rainfall occurred in Ishurdi (1167 mm) and the minimum occurred in Bogura, 
Ishurdi and Rajshahi (0 mm) for the all seasons, but in the monsoon season, maximum rainfall occurred in 
Ishurdi (1167 mm) and minimum rainfall occurred in Bogura (84 mm) and Rajshahi (46 mm). The maximum 
value of CV (178.93%) indicated a large fluctuation in the rainfall data set in the winter season. 

Three meteorological stations, Bogura, Ishurdi and Rajshahi were selected with monthly and annual 
maximum rainfall data through the period 1964 to 2018. These stations were selected judiciously based on 
long records of monthly rainfall data in locations as shown in (Figure 2 and Figure 3). As in Figure 2, an 
overall downward trends were observed for all the selected stations as well as NW Bangladesh. 

From Figure 3, the pre-monsoon rainfall showed an upward trend. On the contrary, the monsoon and 
winter rainfall had downward trends, but a constant one in the post-monsoon season. This result revealed 
that climatic change is observed from station to station as well as according to seasonal variation. To adopt 
this situation, that is the change in climate, a change in crop calendar would be beneficial for the agriculture 
practitioners and policy makers of the region. 
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Table 3: Summary Statistics for rainfall of the three different locations in Bangladesh. 


Overall 
Bogura 0 835 143.821 165.331 114.956 1.291 1.299 
Ishurdi 0 2013 128.765 149.338 115.977 1.703 4.589 
Rajshahi 0 763 122.726 140.675 114.625 1.300 1.281 
Bangladesh 0 2013 395.312 426.761 107.956 1.096 0.545 
Pre-monsoon 
Bogura 0 416 99.212 100.923 101.725 1.172 0.784 
Ishurdi 0 470 93.758 91.465 97.554 1.401 2.089 
Rajshahi 0 301 74.382 70.316 94.534 1.068 0.501 
Bangladesh 0 915 267.352 234.094 87.560 0.851 0.266 
Monsoon 
Bogura 84 835 313.945 150.667 47.991 0.969 0.617 
Ishurdi 0 2013 275.586 145.765 52.893 1.737 6.279 
Rajshahi 46 763 272.832 129.942 47.627 0.752 0.437 
Bangladesh 320 2013 862.364 351.371 40.745 0.792 0.091 
Post-monsoon 
Bogura 0 523 75.873 103.074 135.852 2.039 4.722 
Ishurdi 0 664 68.555 93.756 136.761 3.189 15.673 
Rajshahi 0 358 66.755 81.418 121.966 1.369 1.328 
Bangladesh 0 1462 211.182 258.054 122.195 1.884 4.714 
Winter 
Bogura 0 112 9.147 16.367 178.929 3.223 13.280 
Ishurdi 0 81 10.067 17.069 169.548 2.213 4.678 
Rajshahi 0 92 10.215 15.962 156.269 2.503 8.209 
Bangladesh 0 282 29.429 43.852 149.008 2.449 8.261 


According to Figure 4, the distribution of monthly maximum rainfall of all the stations for the period 
1964-2018, showed that the maximum values are evident from June to September, while a dry period 
occurred from October to March, except that the highest maximum rainfall occurred in Ishurdi in June 1977, 
Rajshahi in July 1997 and September 2000, and in Bogura both in June 1973 and 1988. 

From Table 4, it seems that the LN2, P3 and LP3 distributions all are possible candidates for representing 
pre-monsoon, monsoon, post-monsoon and winter rainfall; and LP3 are possible candidate distributions for 
annual maximum rainfall. It is challenging to identify the best distribution of the observations from the 
moment ratio alone. Table 4 presents AWD values and ranks of the fitted distributions. 

The LP3 distribution is the best one for monsoon and winter rainfall with the P3 as a possible alternative 
because there is not much difference in the AWD value of the LP3 and P3 distributions. The LN2 distribution 
is the best for post-monsoon rainfall with the W2 as a potential alternative. The P3 distribution is the best 
for pre-monsoon rainfall with the GEV as a potential alternative. These results are largely consistent with 
the distribution that represents annual rainfall, which is expected because annual rainfall is the summation of 
rainfall over four seasons, and the sum type distribution of several variables following the same probability 
distribution will not change. 

Table 4 presents the AWD values of 12 months and the ranks of the distributions. It indicates the P3 
distribution as the best one to describe seasonal rainfall statistics from June to September. The LP3 distribution 
is best for October and December with P3 and GEV is the potential alternative. LN2 distribution is the best 
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Figure 3 
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Figure 4: Distribution of monthly maximum rainfall for all stations; Source: Author. 
Table 4: The AWD values of different probability distributions and their ranks. 
beaded AWD values Ranks 
LN2 Exp Ww2 P3. LP3 GEV | LN2 | Exp | W2 | P3 | LP3 | GEV 
Year 0.035 0.086 0.175 0.035 0.034 0.048 2 R) 6 3 1 4 
Pre-monsoon 0.079 0.083 0.166 0.026 0.031 0.028 4 5 6 1 3 2 
Monsoon 0.105 0.088 0.116 0.029 0.029 0.048 > 4 6 2 1 3 
Post-monsoon 0.028 0.071 0.029 0.031 0.085 0.144 1 4 2 3 5 6 
Winter 0.109 0.065 0.115 0.043 0.041 0.048 5 4 6 2 1 3 
Jan 0.153 0.091 0.096 0.046 0.044 0.038 6 4 5 3 2 1 
5 1 6 4 2 3 
1 6 B) 2 4 3 
1 5 4 3 6 2 
1 3 6 2 4 5 
2 4 5 1 6 3 
3 6 4 1 5 2 
2 6 3 1 5 4 
2 5 4 1 3 6 
6 4 5 2 1 3 
Nov 0.152 0.024 0.023 0.098 0.028 0.026 6 2 1 2) 4 3 
Dec 0.178 0.098 0.075 0.050 0.041 0.044 6 5 4 3 1 2 


Shaded bold: Best-fit distribution; Bold regular: Alternative distribution 
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for the months of March to May. The P3 and GEV distribution is a potential alternative for March to May 
rainfall. The AWD value delivers an objective method to select a distribution type in a region. Probability of 
monthly rainfall distribution types is mainly consistent with the type of annual and seasonal rainfall. 


3.2 Accuracy Measures—goodness-of-fit statistic and criteria 


The fitting quality of probability distributions is tested using the goodness-of-fit statistic. Anderson-Darling 
(AD), Cramer-von Mise (CvM), Kolmogorov-Smirnov (KS) and best-fitted probability distribution of 
monthly rainfall data, based on the criterion of goodness-of-fit for the maximum value of log likelihood, 
the minimum value of Akaike information criterion (AIC), and the minimum value of Bayesian Information 
Criterion (BIC) (Table 5) are used. 

Themaximum likelihood estimation method estimated the parameters of the fitted probability distributions. 
The test statistic K-S, AD and CvM for each data set were computed for 6 probability distributions. The 
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Figure 5: CDF, histogram & density curve and pp plot of maximum rainfall for all meteorological stations; Source: Author. 
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Table 5: Results of goodness-of-fit test statistic for different distributions of monthly maximum rainfall of three locations in NW 
Bangladesh. 
SL Station Mean SD Skewness Goodness-of-fit test results 
K-S test AD test CvM test 

1 Bogura 143.82 165.33 1.291 LP3 (0.068) LP3 (0.217) LP3 (2.052) 
2 | Ishurdi 128.77 149.34 1.703 LN2 (0.072) LP3 (0.212) LP3 (1.107) 
3 | Rajshahi 122.73 140.67 1.300 GEV (0.081) GEV (0.283) LP3 (3.281) 
4 | Bangladesh | 395.31 426.76 1.096 ERS) (0.172) IEP3 (0.147) ILP3 (3.002) 


probability distribution having the first rank along with their test statistic is presented in Table 4. LP3 using 
K-S test, GEV using AD test and P3 and LP3 using CvM test obtained the first rank for maximum monthly 
rainfall. Thus based on these three tests, three probable distributions became the best independently. 

Commonly, the three goodness-of-fit tests produced diverse categories of distribution. Finally, a ranking 
system including rank | as the best, rank 2 as the second best and so on selects the best-fit distribution for 
each station. The distribution type that contains the lowest sum of the three rankings is the best fit distribution 
for each station. 

According to all goodness-of-fit tests, monthly maximum rainfall of the three locations in NW Bangladesh 
revealed that the LP3 distribution is the best fitted distribution for Bogura and the K-S test proposed the LN2, 
AD distributions and the CvM test proposed the LP3 distribution for Ishurdi. An overall finding of monthly 
maximum rainfall in NW Bangladesh is LP3, which is the best-fitted distribution according to all goodness- 
of-fit tests. 

The rainfall of Bogura, Ishurdi and Rajshahi stations in NW Bangladesh provided the LP3 distribution 
and the rainfall of these stations best fitted the P3 and LN2 distributions. The findings will help in future 
planning as well as ensure welfare of country’s population . 


4. Summary and conclusion 


We have studied a method to identify the best fit probability distribution for monthly maximum rainfall data 
in the selected stations of NW Bangladesh. The data shows that the maximum monthly rainfall at any given 
time ranged 0 mm (minimum) to 1167 mm (maximum), indicating a very large range of fluctuations during 
the monsoon season. The location wise minimum monsoon rainfall is less than ninety (mm) and maximum 
rainfall is greater than 300 (mm) for the period of 1964-2018. The maximum monsoon rainfall was 1167 
(mm) for Ishurdi in 1977 during the 1964-2018 period. The six distributions are fitted for monthly rainfall 
of three locations and the parameters are estimated using the maximum likelihood method. The results of the 
rainfall study for identifying the best probability distribution have revealed that there is a variety of the best 
probability distributions for monthly maximum rainfall data sets in different places. 

The LP3 and LN2 distributions were found to be the best fitted probability distribution models for the 
annual rainfall study. On the basis of goodness-of-fit criterion the P3 distribution is the best one for describing 
rainfall statistics of June to September. The LP3 distribution is best for October and December with P3 and 
GEV being the potential alternatives. The LN2 distribution is best for March to May. The P3 and GEV 
distribution is a potential alternative for March to May rainfall. The probability distribution of annual rainfall 
distribution types are mainly compatible with the distribution identified for annual and seasonal rainfall. The 
methodological results are clearly well-known and the analytical strategies formulated and established in this 
study may be accurately effective in documenting the best fitted probability distribution of climate parameters. 
These results show that climate change is observed from station to station as well as according to the change 
of seasons. To address this, climate change and crop calendar changes will be beneficial to agricultural 
practitioners and policy makers in the region, and it has been discovered that regional approaches will be 
more beneficial in allocating maximum rainfall in other parts of the country. We hope that this distribution 
may play an important role for sustainable development of agriculture practitioners in Bangladesh. 
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